ahs95
/

sentiment-sarcasm-detection-BanglaBERT

@@ -4,69 +4,108 @@ language:
 - bn
 metrics:
 - f1
 base_model:
 - csebuetnlp/banglabert_small
 pipeline_tag: text-classification
 library_name: transformers
 tags:
-- bangla-nlp
 - sentiment-analysis
 - sarcasm-detection
 ---
-# Bangla Sentiment and Sarcasm Detection Model
-This repository hosts the trained model for detecting sentiment and sarcasm in Bangla social media comments, specifically focusing on reactions to Bangladesh's performance in the 2023 ICC Cricket World Cup. The model is designed to classify comments into sentiment categories (positive, negative, neutral) and identify sarcasm (sarcastic, non-sarcastic).
-## 📚 Overview
-The model is based on a dual-head transformer architecture fine-tuned using **BanglaBERT**. It addresses class imbalance through focal loss and employs multilabel stratified k-fold cross-validation for robust evaluation.
-## 🧠 Key Features
-- **Manually Annotated Dataset**: Utilizes a comprehensive collection of **5,635** Bangla comments.
-- **Custom Dual-Head Classification Model**: Jointly detects sentiment and sarcasm.
-- **Focal Loss Integration**: Effectively manages class imbalance in the dataset.
-- **Multilabel Stratified K-Fold Cross-Validation**: Ensures reliable model evaluation.
-- **Interactive Gradio Interface**: Provides real-time predictions and user interaction.
-- **Open Source**: Publicly available [code and dataset](https://github.com/ahs95/sentiment-analysis-cwcbd23) for reproducibility and further research.
-## 📁 Dataset
-The dataset used for training is the largest publicly available collection of Bangla comments focused on sentiment and sarcasm detection:
-- **Source**: Social media comments related to Bangladesh’s 2023 ICC Cricket World Cup performance.
-- **Size**: **5,635** manually annotated samples.
-- **Labels**:
-  - **Sentiment**: Positive / Negative / Neutral
-  - **Sarcasm**: Sarcastic / Non-sarcastic
-## 🤖 Model Architecture
-- **Base Model**: BanglaBERT
-- **Custom Head**: Dual-output head for multi-task classification.
-- **Loss Function**: Combined focal loss for both tasks.
-- **Training Strategy**: Multilabel stratified k-fold cross-validation to enhance model performance and reliability.
-## 🚀 Usage
-To use the model for inference, you can follow these steps:
-1. Install the required libraries:
-   ```bash
-   pip install transformers torch
-   ```
-2. Load the model:
-   ```python
-   from transformers import AutoModelForSequenceClassification, AutoTokenizer
-   model = AutoModelForSequenceClassification.from_pretrained("ahs95/sentiment-sarcasm-detection-BanglaBERT")
-   tokenizer = AutoTokenizer.from_pretrained("ahs95/sentiment-sarcasm-detection-BanglaBERT")
-   ```
-3. Make predictions:
-   ```python
-   inputs = tokenizer("মায়ের দোয়া ক্রিকেট বোর্ডে আপনাকে স্বাগতম", return_tensors="pt")
-   outputs = model(**inputs)
-   ```

 - bn
 metrics:
 - f1
+- precision
+- recall
 base_model:
 - csebuetnlp/banglabert_small
 pipeline_tag: text-classification
 library_name: transformers
 tags:
+- bangla
 - sentiment-analysis
 - sarcasm-detection
+- low-resource
+- sports-analytics
+- social-media
 ---
+# BanglaBERT Dual-Head Model for Sentiment and Sarcasm Detection
+## Overview
+This repository contains a **fine-tuned BanglaBERT model** for **dual-head multi-label classification** — detecting both **sentiment** (positive, neutral, negative) and **sarcasm** (sarcastic, non-sarcastic) in Bangla social media text.
+The model is designed for **low-resource NLP** and is trained on a manually annotated dataset of **5,635 Bangla Facebook and YouTube comments** related to Bangladesh’s performance in the **2023 ICC Cricket World Cup**.
+## Model Architecture
+* **Base Model:** [csebuetnlp/banglabert_small](https://huggingface.co/csebuetnlp/banglabert_small)
+* **Architecture:** Transformer-based dual-head classification
+  * Head 1 → Sentiment Classification (3 classes)
+  * Head 2 → Sarcasm Detection (2 classes)
+* **Training Techniques:**
+  * Focal Loss with class weighting to handle **severe data imbalance**
+  * Multilabel stratified K-fold cross-validation
+  * Domain-specific data preprocessing for Bangla text
+## Dataset
+* **Size:** 5,635 manually annotated comments
+* **Labels:**
+  * Sentiment: Positive, Neutral, Negative
+  * Sarcasm: Sarcastic, Non-Sarcastic
+* **Source:** Publicly available Facebook & YouTube comments (2023 ICC Cricket World Cup)
+## Performance
+| Task              | Weighted F1 | Class-wise F1 (Minority)      | Class-wise F1 (Majority) |
+| ----------------- | ----------- | ----------------------------- | ------------------------ |
+| Sentiment         | **0.89**    | Neutral: 0.69, Positive: 0.73 | Negative: 0.96           |
+| Sarcasm Detection | **0.84**    | Sarcastic: 0.60               | Non-Sarcastic: 0.91      |
+**Key Gains:**
+* +0.20 F1 improvement for Neutral sentiment
+* +0.18 F1 improvement for Sarcastic content
+* Attributed to focal loss + inverse class weighting
+## Example Usage
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+# Load tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained("your-username/banglabert-sentiment-sarcasm")
+model = AutoModelForSequenceClassification.from_pretrained("your-username/banglabert-sentiment-sarcasm")
+# Example Bangla text
+text = "শিক্ষা সফর 2023 বাংলাদেশ টু ইন্ডিয়া সফল হোক"
+# Tokenize
+inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
+# Predict
+with torch.no_grad():
+    outputs = model(**inputs)
+# Raw logits
+print(outputs.logits)
+```
+## Intended Use
+* **Sports analytics:** Track fan sentiment and sarcasm during live matches
+* **Social media monitoring:** Identify sarcastic backlash and emotional trends
+* **Brand reputation analysis:** Understand nuanced customer feedback in Bangla
+## Limitations
+* Domain-specific: Trained on cricket-related data; performance may drop in other contexts
+* Context sensitivity: Some sarcasm requires cultural or multimodal cues (e.g., emojis)
+* Not suitable for toxic speech moderation without additional fine-tuning
+## Citation
+If you use this model in your work, please cite:
+@misc{hoque2025banglabertsentimentsarcasm,
+  author = {Arshadul Hoque, Nasrin Sultana, Risul Islam Rasel},
+  title = {Bangla Sentiment and Sarcasm Detection: Reactions to Bangladesh's 2023 World Cup},
+  note = {Manuscript under review},
+  year = {2025},
+  publisher = {Hugging Face},
+  url = {https://huggingface.co/ahs95/sentiment-sarcasm-detection-BanglaBERT}
+}