Pre-Trained Emotion Classifiers for AffectiveLens

Model Description

This repository contains a collection of 6 machine learning models trained for the AffectiveLens Project. Each model is a classifier designed to predict the emotional valence (Positive, Negative, or Neutral) of a given text.

These models were trained on 768-dimensional sentence embeddings generated by the distilbert-base-uncased model. The source text data comes from the GoEmotions dataset, which was processed and is available at psyrishi/MoodPulse.

This repository allows you to easily download and use the final, trained artifacts of the study without needing to retrain them.

Models in this Repository

The following scikit-learn compatible models are available. They are saved as .pkl files using joblib. The filename includes the model's F1-score on the validation set, which was used for initial selection.

  • LightGBM_MicroF1_0.6240.pkl (Champion Model)
  • XGBoost_MicroF1_0.6205.pkl
  • Random_Forest_MicroF1_0.6192.pkl
  • CatBoost_MicroF1_0.6175.pkl
  • Logistic_Regression_MicroF1_0.6026.pkl
  • Linear_SVM_MicroF1_0.6005.pkl

How to Use

You can easily download any model from this repository using the huggingface_hub library. You will also need a Transformer model (like distilbert-base-uncased) to generate the embeddings for your input text.

Here is an example of how to download and use the champion model (LightGBM):

import joblib
import torch
from transformers import AutoTokenizer, AutoModel
from huggingface_hub import hf_hub_download

# --- 1. Load the Embedding Model and Tokenizer ---
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
embedding_model = AutoModel.from_pretrained("distilbert-base-uncased").to(device)
embedding_model.eval()

# --- 2. Download the Champion Classifier from the Hub ---
REPO_ID = "psyrishi/affectivelens-emotion-models"
FILENAME = "LightGBM_MicroF1_0.6240.pkl"

print(f"Downloading model '{FILENAME}' from '{REPO_ID}'...")
model_path = hf_hub_download(repo_id=REPO_ID, filename=FILENAME)

# --- 3. Load the Classifier ---
classifier = joblib.load(model_path)
print("Successfully loaded the champion classifier.")

# --- 4. Create a Prediction Function ---
def predict_emotion(text: str):
    # Tokenize the input text
    inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=512)
    inputs = {k: v.to(device) for k, v in inputs.items()}

    # Get the embedding from the Transformer model
    with torch.no_grad():
        outputs = embedding_model(**inputs)
        embedding = outputs.last_hidden_state[:, 0, :].cpu().numpy()

    # Use the classifier to predict
    prediction_index = classifier.predict(embedding)[0]
    emotion_labels = ['negative', 'neutral', 'positive']
    
    return emotion_labels[prediction_index]

# --- 5. Make a Prediction ---
my_text = "This was an amazing experience, I am so happy!"
predicted_emotion = predict_emotion(my_text)
print(f"\nText: '{my_text}'")
print(f"--> Predicted Emotion: {predicted_emotion}")

Training Procedure

Training Data

The models were trained on a processed version of the GoEmotions dataset. The full data pipeline, including the raw data, tokenized data, and final embeddings, is available at the psyrishi/MoodPulse repository. The training set was balanced using RandomOversampling.

Training Workflow

The complete end-to-end workflow, from data preparation to model training and evaluation, is documented in the main project repository: AffectiveLens on GitHub.

Evaluation Results

The models were evaluated on a held-out, unseen test set. The LightGBM model emerged as the champion performer.

Champion Model (LightGBM) Performance on Test Set:

  • F1-Score (Micro): 0.6223
  • Accuracy: 62.23%

Citation

If you use these models, please cite the original GoEmotions paper:

@inproceedings{demszky2020goemotions,
  title={GoEmotions: A Dataset of Fine-Grained Emotions},
  author={Demszky, Dorottya and Movshovitz-Attias, Dana and Ko, Jeongwoo and Cowen, Alan and Nemade, Gaurav and Ravi, Sujith},
  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL)},
  year={2020}
}

Licensing Information

The models in this repository are licensed under the MIT License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train psyrishi/affectivelens-emotion-models