Update README.md
Browse files
README.md
CHANGED
@@ -25,6 +25,10 @@ You can also download the pretrained model from here, [EstBERT_128]() [EstBERT_5
|
|
25 |
#### Dataset used to train the model
|
26 |
The EstBERT model is trained both on 128 and 512 sequence length of data. For training the EstBERT we used the [Estonian National Corpus 2017](https://metashare.ut.ee/repository/browse/estonian-national-corpus-2017/b616ceda30ce11e8a6e4005056b40024880158b577154c01bd3d3fcfc9b762b3/), which was the largest Estonian language corpus available at the time. It consists of four sub-corpora: Estonian Reference Corpus 1990-2008, Estonian Web Corpus 2013, Estonian Web Corpus 2017 and Estonian Wikipedia Corpus 2017.
|
27 |
|
|
|
|
|
|
|
|
|
28 |
### Why would I use?
|
29 |
Overall EstBERT performs better in parts of speech (POS), name entity recognition (NER), rubric, and sentiment classification tasks compared to mBERT and XLM-RoBERTa. The comparative results can be found below;
|
30 |
|
|
|
25 |
#### Dataset used to train the model
|
26 |
The EstBERT model is trained both on 128 and 512 sequence length of data. For training the EstBERT we used the [Estonian National Corpus 2017](https://metashare.ut.ee/repository/browse/estonian-national-corpus-2017/b616ceda30ce11e8a6e4005056b40024880158b577154c01bd3d3fcfc9b762b3/), which was the largest Estonian language corpus available at the time. It consists of four sub-corpora: Estonian Reference Corpus 1990-2008, Estonian Web Corpus 2013, Estonian Web Corpus 2017 and Estonian Wikipedia Corpus 2017.
|
27 |
|
28 |
+
### Reference to cite
|
29 |
+
|
30 |
+
(Tanvir et al, 2021)[https://aclanthology.org/2021.nodalida-main.2/]
|
31 |
+
|
32 |
### Why would I use?
|
33 |
Overall EstBERT performs better in parts of speech (POS), name entity recognition (NER), rubric, and sentiment classification tasks compared to mBERT and XLM-RoBERTa. The comparative results can be found below;
|
34 |
|