Update README.md
Browse files
README.md
CHANGED
@@ -26,27 +26,25 @@ Evaluation of the GPT model has shown that its annotation performance is
|
|
26 |
comparable to those of human annotators.
|
27 |
|
28 |
The fine-tuned ParlaCAP model achieves 0.723 in macro-F1 on an English test set,
|
29 |
-
0.686 in macro-F1 on a Croatian test set, 0.710 in macro-F1 on a Serbian test set and
|
30 |
-
(880 instances from ParlaMint-GB 4.1, ParlaMint-HR 4.1, ParlaMint-RS 4.1 and ParlaMint-BA 4.1, respectively, balanced by labels).
|
31 |
-
|
32 |
-
An additional evaluation on smaller samples from Czech ParlaMint-CZ, Bulgarian ParlaMint-BG and Ukrainian ParlaMint-UA datasets shows
|
33 |
-
that the model achieves macro-F1 scores of 0.736, 0.75 and 0.805 on these three test datasets, respectively.
|
34 |
|
35 |
For end use scenarios, we recommend filtering out predictions based on the model's prediction confidence.
|
36 |
|
37 |
When the model was applied to the ParlaMint datasets, we annotated instances that were predicted with confidence below 0.60 as "Mix".
|
38 |
|
39 |
-
With this approach, we annotate as Mix 8.
|
40 |
-
11.4% of instances in the Croatian test set, 11.1% of instances in the Serbian test
|
41 |
|
42 |
|
43 |
Performance of the model on the remaining instances (all instances not annotated as "Mix"):
|
44 |
|
45 |
| | micro-F1 | macro-F1 | accuracy |
|
46 |
|:---|-----------:|-----------:|-----------:|
|
47 |
-
|
|
48 |
-
|
|
49 |
-
|
|
|
|
50 |
|
51 |
|
52 |
|
|
|
26 |
comparable to those of human annotators.
|
27 |
|
28 |
The fine-tuned ParlaCAP model achieves 0.723 in macro-F1 on an English test set,
|
29 |
+
0.686 in macro-F1 on a Croatian test set, 0.710 in macro-F1 on a Serbian test set and 0.646 in macro-F1 on a Bosnian test set
|
30 |
+
(app. 880 instances from ParlaMint-GB 4.1, ParlaMint-HR 4.1, ParlaMint-RS 4.1 and ParlaMint-BA 4.1, respectively, balanced by labels).
|
|
|
|
|
|
|
31 |
|
32 |
For end use scenarios, we recommend filtering out predictions based on the model's prediction confidence.
|
33 |
|
34 |
When the model was applied to the ParlaMint datasets, we annotated instances that were predicted with confidence below 0.60 as "Mix".
|
35 |
|
36 |
+
With this approach, we annotate as Mix 8.9% of instances in the English test set,
|
37 |
+
11.4% of instances in the Croatian test set, 11.1% of instances in the Serbian and Bosnian test sets.
|
38 |
|
39 |
|
40 |
Performance of the model on the remaining instances (all instances not annotated as "Mix"):
|
41 |
|
42 |
| | micro-F1 | macro-F1 | accuracy |
|
43 |
|:---|-----------:|-----------:|-----------:|
|
44 |
+
| en | 0.761 | 0.758 | 0.761 |
|
45 |
+
| sr | 0.749 | 0.743 | 0.749 |
|
46 |
+
| hr | 0.724 | 0.726 | 0.724 |
|
47 |
+
| bs | 0.686 | 0.680 | 0.686 |
|
48 |
|
49 |
|
50 |
|