classla
/

ParlaCAP-Topic-Classifier

Text Classification

Model card Files Files and versions

TajaKuzmanPungersek commited on 6 days ago

Commit

dfe75be

·

verified ·

1 Parent(s): 47519da

Update README.md

Files changed (1) hide show

README.md +8 -5

README.md CHANGED Viewed

@@ -25,8 +25,9 @@ following the [LLM teacher-student framework](https://ieeexplore.ieee.org/docume
 Evaluation of the GPT model has shown that its annotation performance is
 comparable to those of human annotators.
-The fine-tuned ParlaCAP model achieves 0.752 in macro-F1 on an English test set (440 instances from ParlaMint-GB 4.1, balanced by labels)
-and 0.694 in macro-F1 on a Croatian test set (440 instances from ParlaMint-HR 4.1, balanced by labels).
 An additional evaluation on smaller samples from Czech ParlaMint-CZ, Bulgarian ParlaMint-BG and Ukrainian ParlaMint-UA datasets shows
 that the model achieves macro-F1 scores of 0.736, 0.75 and 0.805 on these three test datasets, respectively.
@@ -35,18 +36,20 @@ For end use scenarios, we recommend filtering out predictions based on the model
 When the model was applied to the ParlaMint datasets, we annotated instances that were predicted with confidence below 0.60 as "Mix".
-With this approach, we annotate as Mix:
-- 8.6% of instances in the English test set
-- 11.4% of instances in the Croatian test set
 Performance of the model on the remaining instances (all instances not annotated as "Mix"):
 |    |   micro-F1 |   macro-F1 |   accuracy |
 |:---|-----------:|-----------:|-----------:|
 | EN |   0.780 |    0.779 |   0.779 |
 | HR |   0.724 |   0.726 |   0.724 |
 ## Use
 To use the model:

 Evaluation of the GPT model has shown that its annotation performance is
 comparable to those of human annotators.
+The fine-tuned ParlaCAP model achieves 0.723 in macro-F1 on an English test set,
+0.686 in macro-F1 on a Croatian test set, 0.710 in macro-F1 on a Serbian test set and .. in macro-F1 on a Bosnian test set
+(880 instances from ParlaMint-GB 4.1, ParlaMint-HR 4.1, ParlaMint-RS 4.1 and ParlaMint-BA 4.1, respectively, balanced by labels).
 An additional evaluation on smaller samples from Czech ParlaMint-CZ, Bulgarian ParlaMint-BG and Ukrainian ParlaMint-UA datasets shows
 that the model achieves macro-F1 scores of 0.736, 0.75 and 0.805 on these three test datasets, respectively.
 When the model was applied to the ParlaMint datasets, we annotated instances that were predicted with confidence below 0.60 as "Mix".
+With this approach, we annotate as Mix 8.6% of instances in the English test set,
+11.4% of instances in the Croatian test set, 11.1% of instances in the Serbian test set.
 Performance of the model on the remaining instances (all instances not annotated as "Mix"):
 |    |   micro-F1 |   macro-F1 |   accuracy |
 |:---|-----------:|-----------:|-----------:|
 | EN |   0.780 |    0.779 |   0.779 |
+| RS |   0.749 |   0.743 |   0.749 |
 | HR |   0.724 |   0.726 |   0.724 |
 ## Use
 To use the model: