---
license: cc-by-4.0
tags:
- sentiment-classification
- telugu
- multilingual
- mbert
- baseline
language: te
datasets:
- DSL-13-SRMAP/TeSent_Benchmark-Dataset
model_name: mBERT_WOR
---

# mBERT_WOR: Telugu Sentiment Classification Model (Without Rationale)

## Model Overview

**mBERT_WOR** is a Telugu sentiment classification model based on Google's mBERT (BERT-base-multilingual-cased), trained specifically for sentence-level sentiment analysis **without rationale supervision**. The acronym "WOR" stands for "Without Rationale," indicating that this model was trained using only the sentiment labels and not the human-annotated rationales provided in the TeSent_Benchmark-Dataset.

---

## Model Details

- **Architecture:** mBERT (BERT-base-multilingual-cased, 12 layers, ~100M parameters)
- **Pretraining Data:** Wikipedia articles in 104 languages (including Telugu), using Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) objectives.
- **Fine-tuning Data:** TeSent_Benchmark-Dataset (Telugu only), using only the sentence-level sentiment labels (positive, negative, neutral); rationale annotations are not used in training.
- **Task:** Sentence-level sentiment classification (3-way)
- **Rationale Usage:** Not used during training or inference

---

## Intended Use

- **Primary Use:** Benchmarking Telugu sentiment classification on the TeSent_Benchmark-Dataset, especially as a **baseline** for models trained with and without rationales.
- **Research Setting:** Designed for academic research, particularly in low-resource and explainable NLP settings.

---

## Performance and Limitations

- **Strengths:**  
  - Leverages shared multilingual representations, enabling cross-lingual transfer and reasonable performance for Telugu even with limited labeled data.
  - Serves as a robust baseline for Telugu sentiment tasks.
- **Limitations:**  
  - Not specifically optimized for Telugu morphology or syntax, which may impact its ability to capture fine-grained, language-specific sentiment cues.
  - May underperform compared to Telugu-specialized models such as IndicBERT or L3Cube-Telugu-BERT, especially for nuanced or idiomatic expressions.
  - Since rationales are not used, the model cannot provide explicit explanations for its predictions.

---

## Training Data

- **Dataset:** [TeSent_Benchmark-Dataset](https://github.com/DSL-13-SRMAP/TeSent_Benchmark-Dataset)
- **Data Used:** Only the **Content** (Telugu sentence) and **Label** (sentiment label) columns; **rationale** annotations are ignored for mBERT_WOR training.

---

## Language Coverage

- **Language:** Telugu (the only language in the dataset)
- **Note:** While mBERT is a multilingual model, this implementation and evaluation are strictly for Telugu sentiment classification.

---

## Citation and More Details

For detailed experimental setup, evaluation metrics, and comparisons with rationale-based models, **please refer to our paper**.


---

## License

Released under [CC BY 4.0](LICENSE).