TajaKuzmanPungersek commited on
Commit
4f8054d
·
verified ·
1 Parent(s): dfe75be

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -10
README.md CHANGED
@@ -26,27 +26,25 @@ Evaluation of the GPT model has shown that its annotation performance is
26
  comparable to those of human annotators.
27
 
28
  The fine-tuned ParlaCAP model achieves 0.723 in macro-F1 on an English test set,
29
- 0.686 in macro-F1 on a Croatian test set, 0.710 in macro-F1 on a Serbian test set and .. in macro-F1 on a Bosnian test set
30
- (880 instances from ParlaMint-GB 4.1, ParlaMint-HR 4.1, ParlaMint-RS 4.1 and ParlaMint-BA 4.1, respectively, balanced by labels).
31
-
32
- An additional evaluation on smaller samples from Czech ParlaMint-CZ, Bulgarian ParlaMint-BG and Ukrainian ParlaMint-UA datasets shows
33
- that the model achieves macro-F1 scores of 0.736, 0.75 and 0.805 on these three test datasets, respectively.
34
 
35
  For end use scenarios, we recommend filtering out predictions based on the model's prediction confidence.
36
 
37
  When the model was applied to the ParlaMint datasets, we annotated instances that were predicted with confidence below 0.60 as "Mix".
38
 
39
- With this approach, we annotate as Mix 8.6% of instances in the English test set,
40
- 11.4% of instances in the Croatian test set, 11.1% of instances in the Serbian test set.
41
 
42
 
43
  Performance of the model on the remaining instances (all instances not annotated as "Mix"):
44
 
45
  | | micro-F1 | macro-F1 | accuracy |
46
  |:---|-----------:|-----------:|-----------:|
47
- | EN | 0.780 | 0.779 | 0.779 |
48
- | RS | 0.749 | 0.743 | 0.749 |
49
- | HR | 0.724 | 0.726 | 0.724 |
 
50
 
51
 
52
 
 
26
  comparable to those of human annotators.
27
 
28
  The fine-tuned ParlaCAP model achieves 0.723 in macro-F1 on an English test set,
29
+ 0.686 in macro-F1 on a Croatian test set, 0.710 in macro-F1 on a Serbian test set and 0.646 in macro-F1 on a Bosnian test set
30
+ (app. 880 instances from ParlaMint-GB 4.1, ParlaMint-HR 4.1, ParlaMint-RS 4.1 and ParlaMint-BA 4.1, respectively, balanced by labels).
 
 
 
31
 
32
  For end use scenarios, we recommend filtering out predictions based on the model's prediction confidence.
33
 
34
  When the model was applied to the ParlaMint datasets, we annotated instances that were predicted with confidence below 0.60 as "Mix".
35
 
36
+ With this approach, we annotate as Mix 8.9% of instances in the English test set,
37
+ 11.4% of instances in the Croatian test set, 11.1% of instances in the Serbian and Bosnian test sets.
38
 
39
 
40
  Performance of the model on the remaining instances (all instances not annotated as "Mix"):
41
 
42
  | | micro-F1 | macro-F1 | accuracy |
43
  |:---|-----------:|-----------:|-----------:|
44
+ | en | 0.761 | 0.758 | 0.761 |
45
+ | sr | 0.749 | 0.743 | 0.749 |
46
+ | hr | 0.724 | 0.726 | 0.724 |
47
+ | bs | 0.686 | 0.680 | 0.686 |
48
 
49
 
50