5-Point Sentiment Classifier (Longformer) — by spacesedan

A fine-tuned Longformer model for 5-point sentiment classification, optimized to analyze long-form user-generated content like Reddit posts. This model is ideal for understanding nuanced sentiment across a spectrum from very negative to very positive.

Labels

Label Index	Sentiment
0	Very Negative
1	Negative
2	Neutral
3	Positive
4	Very Positive

Datasets Used

This model was fine-tuned using a combination of diverse and reliable datasets:

GoEmotions by Google
→ Converted 27 emotion labels into a 5-point sentiment scale.
Amazon Reviews (fine-grained)
→ Large-scale consumer review dataset with fine-grained sentiment labels.
Kaggle: Twitter and Reddit Sentimental Analysis Dataset
→ Adapted into a 3-class and eventually 5-class format for compatibility.

Training Configuration

Setting	Value
Model Base	Longformer (4096)
Max Sequence Length	1024 tokens
Epochs	4
Batch Size	8
Gradient Accumulation	4
Optimizer	`adamw_torch`
Learning Rate	`2e-5`
Scheduler	Linear
Mixed Precision	FP16
Weight Decay	0.01
Warmup Proportion	0.1
Early Stopping	patience=5, threshold=0.01

Final Evaluation Metrics

Metric	Score
Accuracy	0.671
F1 Score (Macro)	0.642
F1 Score (Weighted)	0.673
Precision (Macro)	0.642
Recall (Macro)	0.646
Loss	0.882

Use Cases

Tracking sentiment across Reddit posts, especially for news or trending headlines.
Analyzing long-form product reviews.
Building a sentiment dashboard for user forums or blogs.

Limitations

Model is trained on English text only.
Sentiment can be subjective, especially across edge cases (e.g., sarcasm or dark humor).
5-class mapping from GoEmotions is heuristic and might introduce some overlap.

Acknowledgements

Special thanks to the original dataset creators:

Google (GoEmotions)
Yassir Acharki (Amazon Reviews fine-grained)
Charan Gowda et al. (Kaggle Reddit/Twitter Sentiment Dataset)

License

This model is available under the same license as the base model (Longformer) and is intended for research and educational use.

✅ Created and maintained by spacesedan

spacesedan
/

reddit-sentiment-analysis-longformer