TajaKuzmanPungersek commited on
Commit
dfe75be
·
verified ·
1 Parent(s): 47519da

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -5
README.md CHANGED
@@ -25,8 +25,9 @@ following the [LLM teacher-student framework](https://ieeexplore.ieee.org/docume
25
  Evaluation of the GPT model has shown that its annotation performance is
26
  comparable to those of human annotators.
27
 
28
- The fine-tuned ParlaCAP model achieves 0.752 in macro-F1 on an English test set (440 instances from ParlaMint-GB 4.1, balanced by labels)
29
- and 0.694 in macro-F1 on a Croatian test set (440 instances from ParlaMint-HR 4.1, balanced by labels).
 
30
 
31
  An additional evaluation on smaller samples from Czech ParlaMint-CZ, Bulgarian ParlaMint-BG and Ukrainian ParlaMint-UA datasets shows
32
  that the model achieves macro-F1 scores of 0.736, 0.75 and 0.805 on these three test datasets, respectively.
@@ -35,18 +36,20 @@ For end use scenarios, we recommend filtering out predictions based on the model
35
 
36
  When the model was applied to the ParlaMint datasets, we annotated instances that were predicted with confidence below 0.60 as "Mix".
37
 
38
- With this approach, we annotate as Mix:
39
- - 8.6% of instances in the English test set
40
- - 11.4% of instances in the Croatian test set
41
 
42
  Performance of the model on the remaining instances (all instances not annotated as "Mix"):
43
 
44
  | | micro-F1 | macro-F1 | accuracy |
45
  |:---|-----------:|-----------:|-----------:|
46
  | EN | 0.780 | 0.779 | 0.779 |
 
47
  | HR | 0.724 | 0.726 | 0.724 |
48
 
49
 
 
50
  ## Use
51
 
52
  To use the model:
 
25
  Evaluation of the GPT model has shown that its annotation performance is
26
  comparable to those of human annotators.
27
 
28
+ The fine-tuned ParlaCAP model achieves 0.723 in macro-F1 on an English test set,
29
+ 0.686 in macro-F1 on a Croatian test set, 0.710 in macro-F1 on a Serbian test set and .. in macro-F1 on a Bosnian test set
30
+ (880 instances from ParlaMint-GB 4.1, ParlaMint-HR 4.1, ParlaMint-RS 4.1 and ParlaMint-BA 4.1, respectively, balanced by labels).
31
 
32
  An additional evaluation on smaller samples from Czech ParlaMint-CZ, Bulgarian ParlaMint-BG and Ukrainian ParlaMint-UA datasets shows
33
  that the model achieves macro-F1 scores of 0.736, 0.75 and 0.805 on these three test datasets, respectively.
 
36
 
37
  When the model was applied to the ParlaMint datasets, we annotated instances that were predicted with confidence below 0.60 as "Mix".
38
 
39
+ With this approach, we annotate as Mix 8.6% of instances in the English test set,
40
+ 11.4% of instances in the Croatian test set, 11.1% of instances in the Serbian test set.
41
+
42
 
43
  Performance of the model on the remaining instances (all instances not annotated as "Mix"):
44
 
45
  | | micro-F1 | macro-F1 | accuracy |
46
  |:---|-----------:|-----------:|-----------:|
47
  | EN | 0.780 | 0.779 | 0.779 |
48
+ | RS | 0.749 | 0.743 | 0.749 |
49
  | HR | 0.724 | 0.726 | 0.724 |
50
 
51
 
52
+
53
  ## Use
54
 
55
  To use the model: