File size: 7,799 Bytes
1badba0
 
57111a5
5289b55
 
1badba0
5033158
9a3700c
 
ce45b2a
9a3700c
57111a5
9a3700c
 
 
 
 
 
 
 
 
57111a5
9a3700c
 
 
57111a5
9a3700c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3d81984
9a3700c
 
3d81984
9a3700c
 
3d81984
9a3700c
 
 
 
 
3691cb1
 
 
 
9a3700c
 
3691cb1
 
 
 
8dfd9c8
 
 
3691cb1
 
 
 
 
 
8dfd9c8
 
 
3691cb1
 
9a3700c
8dfd9c8
9a3700c
 
3691cb1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9a3700c
 
 
 
3d81984
9a3700c
 
 
 
 
 
 
 
24c6ddc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57111a5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
---
base_model:
- google/gemma-2-2b-it
datasets:
- saidines12/telugu_news_dataset
---



# Model Card for Gemma-2-2B-it Telugu News Headline Generator

This model is a fine-tuned version of Google's Gemma-2-2B Instruction model, optimized for generating Telugu news headlines from article content. It has been trained using Supervised Fine-Tuning (SFT) on a Telugu news dataset.

## Model Details

### Model Description

- **Developed by:** Google (base model) with Telugu news fine-tuning
- **Model type:** Decoder-only transformer language model
- **Language(s):** Telugu
- **License:** Apache 2.0
- **Finetuned from model:** Gemma-2-2B

### Model Sources
- **Repository:** Hugging Face Hub
- **Base Model:** google/gemma-2-2b-it


## How to Get Started with the Model

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation")
tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")

text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs)
headline = tokenizer.decode(outputs[0], skip_special_tokens=True)
```

## Training Details

### Training Data
- Telugu news articles and headlines dataset
- Data cleaned and preprocessed for headline generation task
- Articles spanning various news categories

### Training Procedure

#### Training Hyperparameters
- **Training regime:** FP16 mixed precision
- **Batch size:** 6 per device
- **Gradient accumulation steps:** 4
- **Learning rate:** 2e-4
- **Maximum steps:** 20,000
- **Warmup steps:** 25
- **Optimizer:** AdamW
- **Evaluation strategy:** Every 20000 steps 

#### Hardware Specifications
- GPU training with gradient checkpointing
- Parallel data loading with 8 workers

I'll help you add the evaluation information to your markdown file in a clearer tabular format.

Here's how you can structure the evaluation section:

## Evaluation

### ROUGE Score Comparison

| Metric  | Base Model | Finetuned Model | Improvement |
|---------|------------|-----------------|-------------|
| ROUGE-1 | 3.39       | 4.64           | +1.26       |
| ROUGE-2 | 0.26       | 0.41           | +0.14       |
| ROUGE-L | 3.38       | 4.63           | +1.25      |

### Model Prediction Comparison using Bigger model for evaluation

| Category           | Count | Percentage |
|-------------------|-------|------------|
| Total samples     | 5962  | 100%       |
| Same predictions  | 3     | 0.05%      |
| Better predictions| 4610  | 77.32%     |
| Worse predictions | 1349  | 22.63%     |

### Evaluation Methods
- ROUGE scores for headline similarity
- Bigger model's custom metrics for headline appropriateness and relativeness 


## Inference

#### Running the model on a GPU using different precisions

* _Using `torch.float16`_

```python
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", device_map="auto", revision="float16")

input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```

* _Using `torch.bfloat16`_

```python
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", device_map="auto", torch_dtype=torch.bfloat16)

input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```

#### Quantized Versions through `bitsandbytes`

* _Using 8-bit precision (int8)_

```python
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", quantization_config=quantization_config)

input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```

* _Using 4-bit precision_

```python
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_4bit=True)

tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", quantization_config=quantization_config)

input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```


#### Other optimizations

* _Flash Attention 2_

First make sure to install `flash-attn` in your environment `pip install flash-attn`

```diff
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
+   attn_implementation="flash_attention_2"
).to(0)
```

### Inputs and outputs

*   **Input:** Text string, such as a question, a prompt, or a document to be
    summarized.
*   **Output:** Generated English-language text in response to the input, such
    as an answer to a question, or a summary of a document.



## Technical Specifications

### Model Architecture and Objective
- Base architecture: Gemma-2
- Training objective: Supervised fine-tuning for headline generation
- Gradient checkpointing enabled for memory efficiency
- Optimized data loading with pinned memory

### Software
- PyTorch
- Transformers library
- TRL for supervised fine-tuning
- CUDA for GPU acceleration

## Uses

### Direct Use
This model is designed for generating Telugu news headlines from article content. It can be used by:
- News organizations for automated headline generation
- Content creators working with Telugu news content
- Researchers studying Telugu natural language generation

### Out-of-Scope Use
- The model should not be used for generating fake news or misleading headlines
- Not suitable for non-Telugu content
- Not designed for general text generation tasks
- Should not be used for classification or other non-headline generation tasks

## Bias, Risks, and Limitations
- May reflect biases present in Telugu news media
- Performance may vary based on news domain and writing style
- Limited to the vocabulary and patterns present in the training data
- May occasionally generate grammatically incorrect Telugu text
- Could potentially generate sensationalized headlines

### Recommendations
- Use with human oversight for published content
- Verify generated headlines for accuracy
- Monitor output for potential biases
- Implement content filtering for inappropriate generations