Model Description

This is a finetuned roberta-base model aimed at identifying the strength of emotions for an input comment.

Downstream Use

Embeddings for comments can be extracted for downstream analyses

Bias, Risks, and Limitations

Risks: If you are truly unsure of a paragraph/comment's sentiment, seek the advice of humans. This model shows some bias toward more widely represented training classes

Caring is a somewhat confusing category. During training, comments that were annotated as "caring" if they included sympathetic content or indignace on behalf of others. This emotional category will need to be further separated into different categories such as "indignance" and "caring"

Sarcasm is treated as the combination of "amusement" and "disapproval" amusement can apply to irony and humorous tone, but largely applies to sarcasm... adding a specific class for sarcasm is a much needed improvement that will be pursued later down the line

not many risks... just MANY limitations. The training dataset was initially imbalanced, this was remedied with data augmentation and a weighted loss function... nontheless it struggles with sarcasm and sometimes unpredictable predictions because of dominating classes.

Ultimately, I hope some struggling grad or undergrad student can find this model useful for an arbitrary project they desire to prusue

My use for the project can be found at the below github link

https://github.com/AnnaMarieHo/sentiment-analysis/tree/main

How to Get Started with the Model

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
import numpy as np

def predict_emotions(text, model_name, threshold=0.35):
    # Load model and tokenizer
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    # Tokenize and predict
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=250)
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probabilities = torch.sigmoid(logits).numpy()[0]
    
    # Map probabilities to emotions
    emotions = {emotion: float(prob) for emotion, prob in zip(model.config.id2label.values(), probabilities)}
    
    # Get emotions above threshold and sort by probability
    predicted_emotions = [(emotion, prob) for emotion, prob in emotions.items() if prob >= threshold]
    predicted_emotions.sort(key=lambda x: x[1], reverse=True)
    
    return {
        "text": text,
        "predicted_emotions": predicted_emotions,
        "all_probabilities": dict(sorted(emotions.items(), key=lambda x: x[1], reverse=True)),
        "threshold_used": threshold
    }

# Example usage
result = predict_emotions(
    "I'm feeling really excited and happy about this news!", 
    "model-name",
    threshold=0.35  # Customize threshold here
)

# Print results
print(f"Text: {result['text']}")
print("\nDetected emotions (sorted by probability):")
for emotion, prob in result['predicted_emotions']:
    print(f"  - {emotion.upper()} ({prob:.4f})")

print("\nAll emotion probabilities (sorted):")
for emotion, prob in result['all_probabilities'].items():
    print(f"  {'*' if prob >= result['threshold_used'] else ' '} {emotion}: {prob:.4f}")

Training Hyperparameters

Evaluation

Testing Data, Factors & Metrics

Testing Data

Metrics

Results

Summary

Model Architecture and Objective

Downloads last month
11
Safetensors
Model size
125M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support