BanglaBERT Dual-Head Model for Sentiment and Sarcasm Detection

Overview

This repository contains a fine-tuned BanglaBERT model for dual-head multi-label classification — detecting both sentiment (positive, neutral, negative) and sarcasm (sarcastic, non-sarcastic) in Bangla social media text. The model is designed for low-resource NLP and is trained on a manually annotated dataset of 5,635 Bangla Facebook and YouTube comments related to Bangladesh’s performance in the 2023 ICC Cricket World Cup.

Model Architecture

  • Base Model: csebuetnlp/banglabert_small

  • Architecture: Transformer-based dual-head classification

    • Head 1 → Sentiment Classification (3 classes)
    • Head 2 → Sarcasm Detection (2 classes)
  • Training Techniques:

    • Focal Loss with class weighting to handle severe data imbalance
    • Multilabel stratified K-fold cross-validation
    • Domain-specific data preprocessing for Bangla text

Dataset

  • Size: 5,635 manually annotated comments

  • Labels:

    • Sentiment: Positive, Neutral, Negative
    • Sarcasm: Sarcastic, Non-Sarcastic
  • Source: Publicly available Facebook & YouTube comments (2023 ICC Cricket World Cup)

Performance

Task Weighted F1 Class-wise F1 (Minority) Class-wise F1 (Majority)
Sentiment 0.89 Neutral: 0.69, Positive: 0.73 Negative: 0.96
Sarcasm Detection 0.84 Sarcastic: 0.60 Non-Sarcastic: 0.91

Key Gains:

  • +0.20 F1 improvement for Neutral sentiment
  • +0.18 F1 improvement for Sarcastic content
  • Attributed to focal loss + inverse class weighting

Example Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("your-username/banglabert-sentiment-sarcasm")
model = AutoModelForSequenceClassification.from_pretrained("your-username/banglabert-sentiment-sarcasm")

# Example Bangla text
text = "শিক্ষা সফর 2023 বাংলাদেশ টু ইন্ডিয়া সফল হোক"

# Tokenize
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

# Predict
with torch.no_grad():
    outputs = model(**inputs)

# Raw logits
print(outputs.logits)

Intended Use

  • Sports analytics: Track fan sentiment and sarcasm during live matches
  • Social media monitoring: Identify sarcastic backlash and emotional trends
  • Brand reputation analysis: Understand nuanced customer feedback in Bangla

Limitations

  • Domain-specific: Trained on cricket-related data; performance may drop in other contexts
  • Context sensitivity: Some sarcasm requires cultural or multimodal cues (e.g., emojis)
  • Not suitable for toxic speech moderation without additional fine-tuning

Citation

If you use this model in your work, please cite:

@misc{hoque2025banglabertsentimentsarcasm,
  author = {Arshadul Hoque, Nasrin Sultana, Risul Islam Rasel},
  title = {Bangla Sentiment and Sarcasm Detection: Reactions to Bangladesh's 2023 World Cup},
  note = {Manuscript under review},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/ahs95/sentiment-sarcasm-detection-BanglaBERT}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ahs95/sentiment-sarcasm-detection-BanglaBERT

Finetuned
(2)
this model