borisn70/bert-43-multilabel-emotion-detection

Model Description

This model, "bert-43-multilabel-emotion-detection", is a fine-tuned version of "bert-base-uncased", trained to classify sentences based on their emotional content into one of 43 categories in the English language. The model was trained on a combination of datasets including tweet_emotions, GoEmotions, and synthetic data, amounting to approximately 271,000 records with around 6,306 records per label.

Intended Use

This model is intended for any application that requires understanding or categorizing the emotional content of English text. This could include sentiment analysis, social media monitoring, customer feedback analysis, and more.

Training Data

The training data comprises the following datasets:

Tweet Emotions
GoEmotions
Synthetic data

Training Procedure

The model was trained over 20 epochs, taking about 6 hours on a Google Colab V100 GPU with 16 GB RAM.

The following settings have been used:

from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir='results',        
    optim="adamw_torch",
    learning_rate=2e-5,               # learning rate
    num_train_epochs=20,              # total number of training epochs
    per_device_train_batch_size=128,  # batch size per device during training
    per_device_eval_batch_size=128,   # batch size for evaluation
    warmup_steps=500,                 # number of warmup steps for learning rate scheduler
    weight_decay=0.01,                # strength of weight decay
    logging_dir='./logs',             # directory for storing logs
    logging_steps=100,
)

Performance

The model achieved the following performance metrics on the validation set:

Accuracy: 92.02%
Weighted F1-Score: 91.93%
Weighted Precision: 91.88%
Weighted Recall: 92.02%

Performance details for each of the 43 labels.

Labels Mapping

Label ID	Emotion
0	admiration
1	amusement
2	anger
3	annoyance
4	approval
5	caring
6	confusion
7	curiosity
8	desire
9	disappointment
10	disapproval
11	disgust
12	embarrassment
13	excitement
14	fear
15	gratitude
16	grief
17	joy
18	love
19	nervousness
20	optimism
21	pride
22	realization
23	relief
24	remorse
25	sadness
26	surprise
27	neutral
28	worry
29	happiness
30	fun
31	hate
32	autonomy
33	safety
34	understanding
35	empty
36	enthusiasm
37	recreation
38	sense of belonging
39	meaning
40	sustenance
41	creativity
42	boredom

Accuracy Report

Label	Precision	Recall	F1-Score
0	0.8625	0.7969	0.8284
1	0.9128	0.9558	0.9338
2	0.9028	0.8749	0.8886
3	0.8570	0.8639	0.8605
4	0.8584	0.8449	0.8516
5	0.9343	0.9667	0.9502
6	0.9492	0.9696	0.9593
7	0.9234	0.9462	0.9347
8	0.9644	0.9924	0.9782
9	0.9481	0.9377	0.9428
10	0.9250	0.9267	0.9259
11	0.9653	0.9914	0.9782
12	0.9948	0.9976	0.9962
13	0.9474	0.9676	0.9574
14	0.8926	0.8853	0.8889
15	0.9501	0.9515	0.9508
16	0.9976	0.9990	0.9983
17	0.9114	0.8716	0.8911
18	0.7825	0.7821	0.7823
19	0.9962	0.9990	0.9976
20	0.9516	0.9638	0.9577
21	0.9953	0.9995	0.9974
22	0.9630	0.9791	0.9710
23	0.9134	0.9134	0.9134
24	0.9753	0.9948	0.9849
25	0.7374	0.7469	0.7421
26	0.7864	0.7583	0.7721
27	0.6000	0.5666	0.5828
28	0.7369	0.6836	0.7093
29	0.8066	0.7222	0.7620
30	0.9116	0.9225	0.9170
31	0.9108	0.9524	0.9312
32	0.9611	0.9634	0.9622
33	0.9592	0.9724	0.9657
34	0.9700	0.9686	0.9693
35	0.9459	0.9734	0.9594
36	0.9359	0.9857	0.9601
37	0.9986	0.9986	0.9986
38	0.9943	0.9990	0.9967
39	0.9990	1.0000	0.9995
40	0.9905	0.9914	0.9910
41	0.9981	0.9948	0.9964
42	0.9929	0.9986	0.9957
weighted avg	0.9188	0.9202	0.9193

How to Use

from transformers import pipeline

# Load the pre-trained model and tokenizer
model = 'borisn70/bert-43-multilabel-emotion-detection'
tokenizer = 'borisn70/bert-43-multilabel-emotion-detection'

# Create a pipeline for sentiment analysis
nlp = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)

# Test the model with a sentence
result = nlp("I feel great about this!")

# Print the result
print(result)

Limitations and Biases

The model's performance can vary significantly across different emotional categories, especially those with less representation in the training data.
Users should be cautious about potential biases in the training data, which may be reflected in the model's predictions.

Contact

If you have any questions, feedback, or would like to report any issues regarding the model, please feel free to reach out.

Email: [email protected]
LinkedIn: Boris Atayan

borisn70
/

bert-43-multilabel-emotion-detection