BanglaBERT Dual-Head Model for Sentiment and Sarcasm Detection
Overview
This repository contains a fine-tuned BanglaBERT model for dual-head multi-label classification — detecting both sentiment (positive, neutral, negative) and sarcasm (sarcastic, non-sarcastic) in Bangla social media text. The model is designed for low-resource NLP and is trained on a manually annotated dataset of 5,635 Bangla Facebook and YouTube comments related to Bangladesh’s performance in the 2023 ICC Cricket World Cup.
Model Architecture
Base Model: csebuetnlp/banglabert_small
Architecture: Transformer-based dual-head classification
- Head 1 → Sentiment Classification (3 classes)
- Head 2 → Sarcasm Detection (2 classes)
Training Techniques:
- Focal Loss with class weighting to handle severe data imbalance
- Multilabel stratified K-fold cross-validation
- Domain-specific data preprocessing for Bangla text
Dataset
Size: 5,635 manually annotated comments
Labels:
- Sentiment: Positive, Neutral, Negative
- Sarcasm: Sarcastic, Non-Sarcastic
Source: Publicly available Facebook & YouTube comments (2023 ICC Cricket World Cup)
Performance
Task | Weighted F1 | Class-wise F1 (Minority) | Class-wise F1 (Majority) |
---|---|---|---|
Sentiment | 0.89 | Neutral: 0.69, Positive: 0.73 | Negative: 0.96 |
Sarcasm Detection | 0.84 | Sarcastic: 0.60 | Non-Sarcastic: 0.91 |
Key Gains:
- +0.20 F1 improvement for Neutral sentiment
- +0.18 F1 improvement for Sarcastic content
- Attributed to focal loss + inverse class weighting
Example Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("your-username/banglabert-sentiment-sarcasm")
model = AutoModelForSequenceClassification.from_pretrained("your-username/banglabert-sentiment-sarcasm")
# Example Bangla text
text = "শিক্ষা সফর 2023 বাংলাদেশ টু ইন্ডিয়া সফল হোক"
# Tokenize
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
# Predict
with torch.no_grad():
outputs = model(**inputs)
# Raw logits
print(outputs.logits)
Intended Use
- Sports analytics: Track fan sentiment and sarcasm during live matches
- Social media monitoring: Identify sarcastic backlash and emotional trends
- Brand reputation analysis: Understand nuanced customer feedback in Bangla
Limitations
- Domain-specific: Trained on cricket-related data; performance may drop in other contexts
- Context sensitivity: Some sarcasm requires cultural or multimodal cues (e.g., emojis)
- Not suitable for toxic speech moderation without additional fine-tuning
Citation
If you use this model in your work, please cite:
@misc{hoque2025banglabertsentimentsarcasm,
author = {Arshadul Hoque, Nasrin Sultana, Risul Islam Rasel},
title = {Bangla Sentiment and Sarcasm Detection: Reactions to Bangladesh's 2023 World Cup},
note = {Manuscript under review},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/ahs95/sentiment-sarcasm-detection-BanglaBERT}
}
Model tree for ahs95/sentiment-sarcasm-detection-BanglaBERT
Base model
csebuetnlp/banglabert_small