File size: 7,799 Bytes
1badba0 57111a5 5289b55 1badba0 5033158 9a3700c ce45b2a 9a3700c 57111a5 9a3700c 57111a5 9a3700c 57111a5 9a3700c 3d81984 9a3700c 3d81984 9a3700c 3d81984 9a3700c 3691cb1 9a3700c 3691cb1 8dfd9c8 3691cb1 8dfd9c8 3691cb1 9a3700c 8dfd9c8 9a3700c 3691cb1 9a3700c 3d81984 9a3700c 24c6ddc 57111a5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
---
base_model:
- google/gemma-2-2b-it
datasets:
- saidines12/telugu_news_dataset
---
# Model Card for Gemma-2-2B-it Telugu News Headline Generator
This model is a fine-tuned version of Google's Gemma-2-2B Instruction model, optimized for generating Telugu news headlines from article content. It has been trained using Supervised Fine-Tuning (SFT) on a Telugu news dataset.
## Model Details
### Model Description
- **Developed by:** Google (base model) with Telugu news fine-tuning
- **Model type:** Decoder-only transformer language model
- **Language(s):** Telugu
- **License:** Apache 2.0
- **Finetuned from model:** Gemma-2-2B
### Model Sources
- **Repository:** Hugging Face Hub
- **Base Model:** google/gemma-2-2b-it
## How to Get Started with the Model
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation")
tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs)
headline = tokenizer.decode(outputs[0], skip_special_tokens=True)
```
## Training Details
### Training Data
- Telugu news articles and headlines dataset
- Data cleaned and preprocessed for headline generation task
- Articles spanning various news categories
### Training Procedure
#### Training Hyperparameters
- **Training regime:** FP16 mixed precision
- **Batch size:** 6 per device
- **Gradient accumulation steps:** 4
- **Learning rate:** 2e-4
- **Maximum steps:** 20,000
- **Warmup steps:** 25
- **Optimizer:** AdamW
- **Evaluation strategy:** Every 20000 steps
#### Hardware Specifications
- GPU training with gradient checkpointing
- Parallel data loading with 8 workers
I'll help you add the evaluation information to your markdown file in a clearer tabular format.
Here's how you can structure the evaluation section:
## Evaluation
### ROUGE Score Comparison
| Metric | Base Model | Finetuned Model | Improvement |
|---------|------------|-----------------|-------------|
| ROUGE-1 | 3.39 | 4.64 | +1.26 |
| ROUGE-2 | 0.26 | 0.41 | +0.14 |
| ROUGE-L | 3.38 | 4.63 | +1.25 |
### Model Prediction Comparison using Bigger model for evaluation
| Category | Count | Percentage |
|-------------------|-------|------------|
| Total samples | 5962 | 100% |
| Same predictions | 3 | 0.05% |
| Better predictions| 4610 | 77.32% |
| Worse predictions | 1349 | 22.63% |
### Evaluation Methods
- ROUGE scores for headline similarity
- Bigger model's custom metrics for headline appropriateness and relativeness
## Inference
#### Running the model on a GPU using different precisions
* _Using `torch.float16`_
```python
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", device_map="auto", revision="float16")
input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```
* _Using `torch.bfloat16`_
```python
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", device_map="auto", torch_dtype=torch.bfloat16)
input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```
#### Quantized Versions through `bitsandbytes`
* _Using 8-bit precision (int8)_
```python
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", quantization_config=quantization_config)
input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```
* _Using 4-bit precision_
```python
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", quantization_config=quantization_config)
input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```
#### Other optimizations
* _Flash Attention 2_
First make sure to install `flash-attn` in your environment `pip install flash-attn`
```diff
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
+ attn_implementation="flash_attention_2"
).to(0)
```
### Inputs and outputs
* **Input:** Text string, such as a question, a prompt, or a document to be
summarized.
* **Output:** Generated English-language text in response to the input, such
as an answer to a question, or a summary of a document.
## Technical Specifications
### Model Architecture and Objective
- Base architecture: Gemma-2
- Training objective: Supervised fine-tuning for headline generation
- Gradient checkpointing enabled for memory efficiency
- Optimized data loading with pinned memory
### Software
- PyTorch
- Transformers library
- TRL for supervised fine-tuning
- CUDA for GPU acceleration
## Uses
### Direct Use
This model is designed for generating Telugu news headlines from article content. It can be used by:
- News organizations for automated headline generation
- Content creators working with Telugu news content
- Researchers studying Telugu natural language generation
### Out-of-Scope Use
- The model should not be used for generating fake news or misleading headlines
- Not suitable for non-Telugu content
- Not designed for general text generation tasks
- Should not be used for classification or other non-headline generation tasks
## Bias, Risks, and Limitations
- May reflect biases present in Telugu news media
- Performance may vary based on news domain and writing style
- Limited to the vocabulary and patterns present in the training data
- May occasionally generate grammatically incorrect Telugu text
- Could potentially generate sensationalized headlines
### Recommendations
- Use with human oversight for published content
- Verify generated headlines for accuracy
- Monitor output for potential biases
- Implement content filtering for inappropriate generations |