File size: 3,365 Bytes
74b3161
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
license: cc-by-4.0
tags:
- sentiment-classification
- telugu
- muril
- indian-languages
- baseline
- tesent
language: te
datasets:
- DSL-13-SRMAP/TeSent_Benchmark-Dataset
model_name: MuRIL_WOR
---

# MuRIL_WOR: MuRIL Telugu Sentiment Classification Model (Without Rationale)

## Model Overview

**MuRIL_WOR** is a Telugu sentiment classification model based on **MuRIL (Multilingual Representations for Indian Languages)**, a transformer-based BERT model designed for 17+ Indian languages, including Telugu and English.  
"WOR" in the model name stands for "**Without Rationale**", meaning this model is trained only with sentiment labels from the TeSent_Benchmark-Dataset and **does not use human-annotated rationales**.

---

## Model Details

- **Architecture:** MuRIL (BERT-base for Indian languages, multilingual)
- **Pretraining Data:** Large corpus of Telugu sentences from web, religious scripts, news data, etc.
- **Pretraining Objectives:** Masked Language Modeling (MLM) and Translation Language Modeling (TLM)
- **Fine-tuning Data:** [TeSent_Benchmark-Dataset](https://huggingface.co/datasets/dsl-13-srmap/tesent_benchmark-dataset), using only sentence-level sentiment labels (positive, negative, neutral); rationale annotations are disregarded
- **Task:** Sentence-level sentiment classification (3-way)
- **Rationale Usage:** **Not used** during training or inference ("WOR" = Without Rationale)

---

## Intended Use

- **Primary Use:** Benchmarking Telugu sentiment classification on the TeSent_Benchmark-Dataset, especially as a **baseline** for models trained without rationales
- **Research Setting:** Recommended for academic research in low-resource NLP settings, especially for informal, social media, or conversational Telugu data

---

## Why MuRIL?

MuRIL is specifically pre-trained on Indian languages and offers better understanding of Telugu morphology and syntax compared to generic multilingual models like mBERT and XLM-R.  
Its pre-training favors informal texts from the web, making it especially effective for informal, social media, or conversational NLP tasks in Telugu. For formal/classical Telugu, performance may be lower.

---

## Performance and Limitations

**Strengths:**  
- Superior understanding of Telugu compared to general multilingual models
- Excels in informal, web, or conversational Telugu sentiment tasks
- Robust baseline for Telugu sentiment classification

**Limitations:**  
- May underperform on formal or classical Telugu tasks due to pre-training corpus
- Applicability limited to Telugu analysis; not ideal for highly formal text processing
- Since rationales are not used, the model cannot provide explicit explanations for its predictions

---

## Training Data

- **Dataset:** [TeSent_Benchmark-Dataset](https://huggingface.co/datasets/dsl-13-srmap/tesent_benchmark-dataset)
- **Data Used:** Only the **Content** (Telugu sentence) and **Label** (sentiment label) columns; **rationale** annotations are ignored for MuRIL_WOR training

---

## Language Coverage

- **Language:** Telugu (`te`)
- **Model Scope:** Strictly focused on monolingual Telugu sentiment classification

---

## Citation and More Details

For detailed experimental setup, evaluation metrics, and comparisons with rationale-based models, **please refer to our paper**.


---

## License

Released under [CC BY 4.0](LICENSE).