language:
- cro
license: cc-by-sa-4.0
BERTic-Incorrect-Spelling-Annotator
This BERTic model is designed to annotate incorrectly spelled words in text. It utilizes the following labels:
- 0: Word is written correctly,
- 1: Word is written incorrectly.
Model Output Example
Imagine we have the following Croatian text:
Model u tekstu prepoznije riječi u kojima se nalazaju pogreške .
If we convert input data to format acceptable by BERTic model:
[CLS] model [MASK] u [MASK] tekstu [MASK] prepo ##znije [MASK] riječi [MASK] u [MASK] kojima [MASK] se [MASK] nalaza ##ju [MASK] pogreške [MASK] . [MASK] [SEP]
The model might return the following predictions (note: predictions chosen for demonstration/explanation, not reproducibility!):
Model 0 u 0 tekstu 0 prepoznije 1 riječi 0 u 0 kojima 0 se 0 nalazaju 1 pogreške 0 . 0
We can observe that in the input sentence, the word prepoznije
and nalazaju
are spelled incorrectly, so the model marks them with the token (1).
More details
Testing model with generated test sets provides following result:
Precision: 0.9954 Recall: 0.8764 F1 Score: 0.9321 F0.5 Score: 0.9691
Testing the model with test sets constructed using the Croatian corpus of non-professional written language by typical speakers and speakers with language disorders RAPUT 1.0 dataset provides the following results:
Precision: 0.8213
Recall: 0.3921
F1 Score: 0.5308
F0.5 Score: 0.6738
Acknowledgement
The authors acknowledge the financial support from the Slovenian Research and Innovation Agency - research core funding No. P6-0411: Language Resources and Technologies for Slovene and research project No. J7-3159: Empirical foundations for digitally-supported development of writing skills.
Authors
Thanks to Martin Božič, Marko Robnik-Šikonja and Špela Arhar Holdt for developing this model.
- Downloads last month
- 5