Update README.md
Browse files
README.md
CHANGED
@@ -25,8 +25,9 @@ following the [LLM teacher-student framework](https://ieeexplore.ieee.org/docume
|
|
25 |
Evaluation of the GPT model has shown that its annotation performance is
|
26 |
comparable to those of human annotators.
|
27 |
|
28 |
-
The fine-tuned ParlaCAP model achieves 0.
|
29 |
-
|
|
|
30 |
|
31 |
An additional evaluation on smaller samples from Czech ParlaMint-CZ, Bulgarian ParlaMint-BG and Ukrainian ParlaMint-UA datasets shows
|
32 |
that the model achieves macro-F1 scores of 0.736, 0.75 and 0.805 on these three test datasets, respectively.
|
@@ -35,18 +36,20 @@ For end use scenarios, we recommend filtering out predictions based on the model
|
|
35 |
|
36 |
When the model was applied to the ParlaMint datasets, we annotated instances that were predicted with confidence below 0.60 as "Mix".
|
37 |
|
38 |
-
With this approach, we annotate as Mix
|
39 |
-
|
40 |
-
|
41 |
|
42 |
Performance of the model on the remaining instances (all instances not annotated as "Mix"):
|
43 |
|
44 |
| | micro-F1 | macro-F1 | accuracy |
|
45 |
|:---|-----------:|-----------:|-----------:|
|
46 |
| EN | 0.780 | 0.779 | 0.779 |
|
|
|
47 |
| HR | 0.724 | 0.726 | 0.724 |
|
48 |
|
49 |
|
|
|
50 |
## Use
|
51 |
|
52 |
To use the model:
|
|
|
25 |
Evaluation of the GPT model has shown that its annotation performance is
|
26 |
comparable to those of human annotators.
|
27 |
|
28 |
+
The fine-tuned ParlaCAP model achieves 0.723 in macro-F1 on an English test set,
|
29 |
+
0.686 in macro-F1 on a Croatian test set, 0.710 in macro-F1 on a Serbian test set and .. in macro-F1 on a Bosnian test set
|
30 |
+
(880 instances from ParlaMint-GB 4.1, ParlaMint-HR 4.1, ParlaMint-RS 4.1 and ParlaMint-BA 4.1, respectively, balanced by labels).
|
31 |
|
32 |
An additional evaluation on smaller samples from Czech ParlaMint-CZ, Bulgarian ParlaMint-BG and Ukrainian ParlaMint-UA datasets shows
|
33 |
that the model achieves macro-F1 scores of 0.736, 0.75 and 0.805 on these three test datasets, respectively.
|
|
|
36 |
|
37 |
When the model was applied to the ParlaMint datasets, we annotated instances that were predicted with confidence below 0.60 as "Mix".
|
38 |
|
39 |
+
With this approach, we annotate as Mix 8.6% of instances in the English test set,
|
40 |
+
11.4% of instances in the Croatian test set, 11.1% of instances in the Serbian test set.
|
41 |
+
|
42 |
|
43 |
Performance of the model on the remaining instances (all instances not annotated as "Mix"):
|
44 |
|
45 |
| | micro-F1 | macro-F1 | accuracy |
|
46 |
|:---|-----------:|-----------:|-----------:|
|
47 |
| EN | 0.780 | 0.779 | 0.779 |
|
48 |
+
| RS | 0.749 | 0.743 | 0.749 |
|
49 |
| HR | 0.724 | 0.726 | 0.724 |
|
50 |
|
51 |
|
52 |
+
|
53 |
## Use
|
54 |
|
55 |
To use the model:
|