EstBERT128_sentiment

This model is a fine-tuned version of tartuNLP/EstBERT on the reduced version of the Estonian Valence corpus, where the items with Mixed labels were removed. The data (containing Positive, Negative and Neutral labels) was split into 70/10/20 train/dev/test splits.

It achieves the following results on the developments split:

Loss: 2.2440
Accuracy: 0.7926

It achieves the following results on the test split:

Loss: 2.7633
Accuracy: 0.7479

How to use?

You can use this model with the Transformers pipeline for text classification.

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("tartuNLP/EstBERT128_sentiment")
model = AutoModelForSequenceClassification.from_pretrained("tartuNLP/EstBERT128_sentiment")

nlp = pipeline("text-classification", model=model, tokenizer=tokenizer)
text = "Viimastel nädalatel on üha valjemaks muutunud hääled, mis läbisegi süüdistavad regionaalminister Madis Kallast röövretke korraldamises rikastesse valdadesse ja teisalt tegevusetuses."
result = nlp(text)

print(result)

[{'label': 'negatiivne', 'score': 0.9999992847442627}]

Model description

A single linear layer classifier is fit on top of the last layer [CLS] token representation of the EstBERT model. The model is fully fine-tuned during training.

Intended uses & limitations

This model is intended to be used as it is. We hope that it can prove to be useful to somebody but we do not guarantee that the model is useful for anything or that the predictions are accurate on new data.

Citation information

If you use this model, please cite:

@inproceedings{tanvir2021estbert,
  title={EstBERT: A Pretrained Language-Specific BERT for Estonian},
  author={Tanvir, Hasan and Kittask, Claudia and Eiche, Sandra and Sirts, Kairit},
  booktitle={Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)},
  pages={11--19},
  year={2021}
}

Training and evaluation data

The model was trained and evaluated on the sentiment categories of the Estonian Valence corpus. The data was split into train/dev/test parts with 70/10/20 proportions.

The Estonian Valence corpus has four sentiment labels:

positive
negative
neutral
mixed

Following Pajupuu et al., 2016, the items with mixed labels were removed. Thus, the model was trained and evaluated on the reduced version of the dataset containing only three labels (positive, negative and neutral).

Training procedure

The model was trained for maximu 100 epochs using early stopping procedure. After every epoch, the accuracy was calculated on the development set. If the development set accuracy did not improve for 20 epochs, the training was stopped.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 4
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-06
lr_scheduler_type: polynomial
num_epochs: 100
mixed_precision_training: Native AMP

Training results

The final model was taken after 44th epoch.

Training Loss	Epoch	Step	Validation Loss	Accuracy
0.836	1	38	0.6966	0.7216
0.5336	2	76	0.5948	0.7699
0.2913	3	114	0.7197	0.7358
0.1048	4	152	0.9570	0.7557
0.0424	5	190	1.2144	0.7528
0.0262	6	228	1.2675	0.7727
0.0169	7	266	1.4788	0.75
0.0048	8	304	1.5053	0.7699
0.0084	9	342	1.5368	0.7614
0.0087	10	380	1.6678	0.7699
0.0082	11	418	1.7598	0.7642
0.0104	12	456	1.6951	0.7528
0.0115	13	494	1.7123	0.7727
0.0111	14	532	1.7577	0.7528
0.0028	15	570	1.7383	0.7727
0.0032	16	608	2.0254	0.7727
0.0107	17	646	2.2123	0.7415
0.0056	18	684	1.9406	0.7614
0.0078	19	722	2.2002	0.7642
0.0041	20	760	2.0157	0.7670
0.0087	21	798	2.1228	0.7642
0.0113	22	836	2.3692	0.7727
0.0025	23	874	2.2211	0.75
0.0083	24	912	2.2120	0.7841
0.0104	25	950	2.1478	0.7614
0.0041	26	988	2.1118	0.7756
0.002	27	1026	1.9929	0.7699
0.001	28	1064	2.0295	0.7841
0.003	29	1102	2.3142	0.7699
0.006	30	1140	2.2957	0.7642
0.0005	31	1178	2.0661	0.7642
0.0007	32	1216	2.4220	0.7614
0.0088	33	1254	2.2842	0.7614
0.0	34	1292	2.4060	0.7585
0.0	35	1330	2.2088	0.7585
0.0	36	1368	2.2181	0.7614
0.0	37	1406	2.2560	0.7784
0.0	38	1444	2.4803	0.7585
0.0	39	1482	2.1163	0.7812
0.0087	40	1520	2.3410	0.75
0.0021	41	1558	2.3583	0.75
0.0054	42	1596	2.3546	0.7642
0.0051	43	1634	2.2295	0.7812
0.0	44	1672	2.2440	0.7926
0.0019	45	1710	2.3248	0.7784
0.0044	46	1748	2.3058	0.7841
0.0006	47	1786	2.3588	0.7784
0.0007	48	1824	2.6541	0.7670
0.0001	49	1862	2.4621	0.7614
0.0	50	1900	2.4696	0.7727
0.0	51	1938	2.4981	0.7670
0.0031	52	1976	2.6702	0.7670
0.0	53	2014	2.4448	0.7756
0.0	54	2052	2.4214	0.7756
0.0	55	2090	2.4308	0.7841
0.0001	56	2128	2.5869	0.7642
0.0007	57	2166	2.4803	0.7727
0.0	58	2204	2.4557	0.7784
0.0	59	2242	2.4702	0.7784
0.0	60	2280	2.5165	0.7784
0.0013	61	2318	2.6322	0.7727
0.0001	62	2356	2.6253	0.7756
0.0011	63	2394	2.6303	0.7841
0.0002	64	2432	2.5646	0.7614

Framework versions

Transformers 4.14.1
Pytorch 1.10.1+cu113
Datasets 1.16.1
Tokenizers 0.10.3

Contact

Kairit Sirts: [email protected]

tartuNLP
/

EstBERT128_sentiment