BanglaBERT Dual-Head Model for Sentiment and Sarcasm Detection

Overview

This repository contains a fine-tuned BanglaBERT model for dual-head multi-label classification — detecting both sentiment (positive, neutral, negative) and sarcasm (sarcastic, non-sarcastic) in Bangla social media text. The model is designed for low-resource NLP and is trained on a manually annotated dataset of 5,635 Bangla Facebook and YouTube comments related to Bangladesh’s performance in the 2023 ICC Cricket World Cup.

Model Architecture

Base Model: csebuetnlp/banglabert_small
Architecture: Transformer-based dual-head classification
- Head 1 → Sentiment Classification (3 classes)
- Head 2 → Sarcasm Detection (2 classes)
Training Techniques:
- Focal Loss with class weighting to handle severe data imbalance
- Multilabel stratified K-fold cross-validation
- Domain-specific data preprocessing for Bangla text

Dataset

Size: 5,635 manually annotated comments
Labels:
- Sentiment: Positive, Neutral, Negative
- Sarcasm: Sarcastic, Non-Sarcastic
Source: Publicly available Facebook & YouTube comments (2023 ICC Cricket World Cup)

Performance

Task	Weighted F1	Class-wise F1 (Minority)	Class-wise F1 (Majority)
Sentiment	0.89	Neutral: 0.69, Positive: 0.73	Negative: 0.96
Sarcasm Detection	0.84	Sarcastic: 0.60	Non-Sarcastic: 0.91

Key Gains:

+0.20 F1 improvement for Neutral sentiment
+0.18 F1 improvement for Sarcastic content
Attributed to focal loss + inverse class weighting

Example Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("your-username/banglabert-sentiment-sarcasm")
model = AutoModelForSequenceClassification.from_pretrained("your-username/banglabert-sentiment-sarcasm")

# Example Bangla text
text = "শিক্ষা সফর 2023 বাংলাদেশ টু ইন্ডিয়া সফল হোক"

# Tokenize
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

# Predict
with torch.no_grad():
    outputs = model(**inputs)

# Raw logits
print(outputs.logits)

Intended Use

Sports analytics: Track fan sentiment and sarcasm during live matches
Social media monitoring: Identify sarcastic backlash and emotional trends
Brand reputation analysis: Understand nuanced customer feedback in Bangla

Limitations

Domain-specific: Trained on cricket-related data; performance may drop in other contexts
Context sensitivity: Some sarcasm requires cultural or multimodal cues (e.g., emojis)
Not suitable for toxic speech moderation without additional fine-tuning

Citation

If you use this model in your work, please cite:

@misc{hoque2025banglabertsentimentsarcasm,
  author = {Arshadul Hoque, Nasrin Sultana, Risul Islam Rasel},
  title = {Bangla Sentiment and Sarcasm Detection: Reactions to Bangladesh's 2023 World Cup},
  note = {Manuscript under review},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/ahs95/sentiment-sarcasm-detection-BanglaBERT}
}

ahs95
/

sentiment-sarcasm-detection-BanglaBERT