5-Point Sentiment Classifier (Longformer) β€” by spacesedan

A fine-tuned Longformer model for 5-point sentiment classification, optimized to analyze long-form user-generated content like Reddit posts. This model is ideal for understanding nuanced sentiment across a spectrum from very negative to very positive.


Labels

Label Index Sentiment
0 Very Negative
1 Negative
2 Neutral
3 Positive
4 Very Positive

Datasets Used

This model was fine-tuned using a combination of diverse and reliable datasets:

  1. GoEmotions by Google
    β†’ Converted 27 emotion labels into a 5-point sentiment scale.

  2. Amazon Reviews (fine-grained)
    β†’ Large-scale consumer review dataset with fine-grained sentiment labels.

  3. Kaggle: Twitter and Reddit Sentimental Analysis Dataset
    β†’ Adapted into a 3-class and eventually 5-class format for compatibility.


Training Configuration

Setting Value
Model Base Longformer (4096)
Max Sequence Length 1024 tokens
Epochs 4
Batch Size 8
Gradient Accumulation 4
Optimizer adamw_torch
Learning Rate 2e-5
Scheduler Linear
Mixed Precision FP16
Weight Decay 0.01
Warmup Proportion 0.1
Early Stopping patience=5, threshold=0.01

Final Evaluation Metrics

Metric Score
Accuracy 0.671
F1 Score (Macro) 0.642
F1 Score (Weighted) 0.673
Precision (Macro) 0.642
Recall (Macro) 0.646
Loss 0.882

Use Cases

  • Tracking sentiment across Reddit posts, especially for news or trending headlines.
  • Analyzing long-form product reviews.
  • Building a sentiment dashboard for user forums or blogs.

Limitations

  • Model is trained on English text only.
  • Sentiment can be subjective, especially across edge cases (e.g., sarcasm or dark humor).
  • 5-class mapping from GoEmotions is heuristic and might introduce some overlap.

Acknowledgements

Special thanks to the original dataset creators:

  • Google (GoEmotions)
  • Yassir Acharki (Amazon Reviews fine-grained)
  • Charan Gowda et al. (Kaggle Reddit/Twitter Sentiment Dataset)

License

This model is available under the same license as the base model (Longformer) and is intended for research and educational use.


βœ… Created and maintained by spacesedan

Downloads last month
224
Safetensors
Model size
149M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for spacesedan/reddit-sentiment-analysis-longformer

Finetuned
(1)
this model

Dataset used to train spacesedan/reddit-sentiment-analysis-longformer

Space using spacesedan/reddit-sentiment-analysis-longformer 1