classla
/

ParlaCAP-Topic-Classifier

Text Classification

Model card Files Files and versions

TajaKuzmanPungersek commited on 9 days ago

Commit

4f8054d

·

verified ·

1 Parent(s): dfe75be

Update README.md

Files changed (1) hide show

README.md +8 -10

README.md CHANGED Viewed

@@ -26,27 +26,25 @@ Evaluation of the GPT model has shown that its annotation performance is
 comparable to those of human annotators.
 The fine-tuned ParlaCAP model achieves 0.723 in macro-F1 on an English test set,
-0.686 in macro-F1 on a Croatian test set, 0.710 in macro-F1 on a Serbian test set and .. in macro-F1 on a Bosnian test set
-(880 instances from ParlaMint-GB 4.1, ParlaMint-HR 4.1, ParlaMint-RS 4.1 and ParlaMint-BA 4.1, respectively, balanced by labels).
-An additional evaluation on smaller samples from Czech ParlaMint-CZ, Bulgarian ParlaMint-BG and Ukrainian ParlaMint-UA datasets shows
-that the model achieves macro-F1 scores of 0.736, 0.75 and 0.805 on these three test datasets, respectively.
 For end use scenarios, we recommend filtering out predictions based on the model's prediction confidence.
 When the model was applied to the ParlaMint datasets, we annotated instances that were predicted with confidence below 0.60 as "Mix".
-With this approach, we annotate as Mix 8.6% of instances in the English test set,
-11.4% of instances in the Croatian test set, 11.1% of instances in the Serbian test set.
 Performance of the model on the remaining instances (all instances not annotated as "Mix"):
 |    |   micro-F1 |   macro-F1 |   accuracy |
 |:---|-----------:|-----------:|-----------:|
-| EN |   0.780 |    0.779 |   0.779 |
-| RS |   0.749 |   0.743 |   0.749 |
-| HR |   0.724 |   0.726 |   0.724 |

 comparable to those of human annotators.
 The fine-tuned ParlaCAP model achieves 0.723 in macro-F1 on an English test set,
+0.686 in macro-F1 on a Croatian test set, 0.710 in macro-F1 on a Serbian test set and 0.646 in macro-F1 on a Bosnian test set
+(app. 880 instances from ParlaMint-GB 4.1, ParlaMint-HR 4.1, ParlaMint-RS 4.1 and ParlaMint-BA 4.1, respectively, balanced by labels).
 For end use scenarios, we recommend filtering out predictions based on the model's prediction confidence.
 When the model was applied to the ParlaMint datasets, we annotated instances that were predicted with confidence below 0.60 as "Mix".
+With this approach, we annotate as Mix 8.9% of instances in the English test set,
+11.4% of instances in the Croatian test set, 11.1% of instances in the Serbian and Bosnian test sets.
 Performance of the model on the remaining instances (all instances not annotated as "Mix"):
 |    |   micro-F1 |   macro-F1 |   accuracy |
 |:---|-----------:|-----------:|-----------:|
+| en |   0.761 |   0.758 |   0.761 |
+| sr |   0.749 |   0.743 |   0.749 |
+| hr |   0.724 |   0.726 |   0.724 |
+| bs |   0.686 |   0.680 |   0.686 |