Pre-Trained Emotion Classifiers for AffectiveLens
Model Description
This repository contains a collection of 6 machine learning models trained for the AffectiveLens Project. Each model is a classifier designed to predict the emotional valence (Positive, Negative, or Neutral) of a given text.
These models were trained on 768-dimensional sentence embeddings generated by the distilbert-base-uncased
model. The source text data comes from the GoEmotions dataset, which was processed and is available at psyrishi/MoodPulse.
This repository allows you to easily download and use the final, trained artifacts of the study without needing to retrain them.
Models in this Repository
The following scikit-learn compatible models are available. They are saved as .pkl
files using joblib
. The filename includes the model's F1-score on the validation set, which was used for initial selection.
- LightGBM_MicroF1_0.6240.pkl (Champion Model)
- XGBoost_MicroF1_0.6205.pkl
- Random_Forest_MicroF1_0.6192.pkl
- CatBoost_MicroF1_0.6175.pkl
- Logistic_Regression_MicroF1_0.6026.pkl
- Linear_SVM_MicroF1_0.6005.pkl
How to Use
You can easily download any model from this repository using the huggingface_hub
library. You will also need a Transformer model (like distilbert-base-uncased
) to generate the embeddings for your input text.
Here is an example of how to download and use the champion model (LightGBM):
import joblib
import torch
from transformers import AutoTokenizer, AutoModel
from huggingface_hub import hf_hub_download
# --- 1. Load the Embedding Model and Tokenizer ---
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
embedding_model = AutoModel.from_pretrained("distilbert-base-uncased").to(device)
embedding_model.eval()
# --- 2. Download the Champion Classifier from the Hub ---
REPO_ID = "psyrishi/affectivelens-emotion-models"
FILENAME = "LightGBM_MicroF1_0.6240.pkl"
print(f"Downloading model '{FILENAME}' from '{REPO_ID}'...")
model_path = hf_hub_download(repo_id=REPO_ID, filename=FILENAME)
# --- 3. Load the Classifier ---
classifier = joblib.load(model_path)
print("Successfully loaded the champion classifier.")
# --- 4. Create a Prediction Function ---
def predict_emotion(text: str):
# Tokenize the input text
inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=512)
inputs = {k: v.to(device) for k, v in inputs.items()}
# Get the embedding from the Transformer model
with torch.no_grad():
outputs = embedding_model(**inputs)
embedding = outputs.last_hidden_state[:, 0, :].cpu().numpy()
# Use the classifier to predict
prediction_index = classifier.predict(embedding)[0]
emotion_labels = ['negative', 'neutral', 'positive']
return emotion_labels[prediction_index]
# --- 5. Make a Prediction ---
my_text = "This was an amazing experience, I am so happy!"
predicted_emotion = predict_emotion(my_text)
print(f"\nText: '{my_text}'")
print(f"--> Predicted Emotion: {predicted_emotion}")
Training Procedure
Training Data
The models were trained on a processed version of the GoEmotions dataset. The full data pipeline, including the raw data, tokenized data, and final embeddings, is available at the psyrishi/MoodPulse repository. The training set was balanced using RandomOversampling
.
Training Workflow
The complete end-to-end workflow, from data preparation to model training and evaluation, is documented in the main project repository: AffectiveLens on GitHub.
Evaluation Results
The models were evaluated on a held-out, unseen test set. The LightGBM model emerged as the champion performer.
Champion Model (LightGBM) Performance on Test Set:
- F1-Score (Micro): 0.6223
- Accuracy: 62.23%
Citation
If you use these models, please cite the original GoEmotions paper:
@inproceedings{demszky2020goemotions,
title={GoEmotions: A Dataset of Fine-Grained Emotions},
author={Demszky, Dorottya and Movshovitz-Attias, Dana and Ko, Jeongwoo and Cowen, Alan and Nemade, Gaurav and Ravi, Sujith},
booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL)},
year={2020}
}
Licensing Information
The models in this repository are licensed under the MIT License.