metadata

license: apache-2.0
tags:
  - generated_from_trainer
base_model: indolem/indobertweet-base-uncased
metrics:
  - accuracy
  - precision
  - recall
  - f1
model-index:
  - name: er-model
    results: []
datasets:
  - SEACrowd/prdect_id
language:
  - id
widget:
  - text: Ini toko korup.,ga sesuai sama isinya..not recommended
    example_title: Contoh

indobertweet-base-uncased-emotion-recognition

Model description

This model is a fine-tuned version of indolem/indobertweet-base-uncased on The PRDECT-ID Dataset, it is a compilation of Indonesian product reviews that come with emotion and sentiment labels. These reviews were gathered from one of Indonesia's largest e-commerce platforms, Tokopedia. It achieves the following results on the evaluation set:

Loss: 0.6762
Accuracy: 0.6981
Precision: 0.7022
Recall: 0.6981
F1: 0.6963

It has been trained to classify text into six different emotion categories: happy, sadness, anger, love, and fear.

Training and evaluation data

I split my dataframe df into training, validation, and testing sets (train_df, val_df, test_df) using the train_test_split function from sklearn.model_selection. I set the test size to 20% for the initial split and further divided the remaining data equally between validation and testing sets. This process ensures that each split (val_df and test_df) maintains the same class distribution as the original dataset (stratify=df['label']).

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 64
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 5
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Precision	Recall	F1
0.7817	1.0	266	0.6859	0.7057	0.7140	0.7057	0.7061
0.6052	2.0	532	0.6762	0.6981	0.7022	0.6981	0.6963
0.488	3.0	798	0.7251	0.7189	0.7208	0.7189	0.7192
0.3578	4.0	1064	0.7943	0.7208	0.7240	0.7208	0.7222
0.2887	5.0	1330	0.8250	0.7038	0.7093	0.7038	0.7056

Framework versions

Transformers 4.41.2
Pytorch 2.1.2
Datasets 2.19.2
Tokenizers 0.19.1