---
base_model:
- google/gemma-2-2b-it
datasets:
- saidines12/telugu_news_dataset
---


# Model Card for Gemma-2-2B-it Telugu News Headline Generator

This model is a fine-tuned version of Google's Gemma-2-2B Instruction model, optimized for generating Telugu news headlines from article content. It has been trained using Supervised Fine-Tuning (SFT) on a Telugu news dataset.

## Model Details

### Model Description

- **Developed by:** Google (base model) with Telugu news fine-tuning
- **Model type:** Decoder-only transformer language model
- **Language(s):** Telugu
- **License:** Apache 2.0
- **Finetuned from model:** Gemma-2-2B

### Model Sources
- **Repository:** Hugging Face Hub
- **Base Model:** google/gemma-2-2b-it


## How to Get Started with the Model

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation")
tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")

text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs)
headline = tokenizer.decode(outputs[0], skip_special_tokens=True)
```

## Training Details

### Training Data
- Telugu news articles and headlines dataset
- Data cleaned and preprocessed for headline generation task
- Articles spanning various news categories

### Training Procedure

#### Training Hyperparameters
- **Training regime:** FP16 mixed precision
- **Batch size:** 6 per device
- **Gradient accumulation steps:** 4
- **Learning rate:** 2e-4
- **Maximum steps:** 20,000
- **Warmup steps:** 25
- **Optimizer:** AdamW
- **Evaluation strategy:** Every 20000 steps 

#### Hardware Specifications
- GPU training with gradient checkpointing
- Parallel data loading with 8 workers

I'll help you add the evaluation information to your markdown file in a clearer tabular format.

Here's how you can structure the evaluation section:

## Evaluation

### ROUGE Score Comparison

| Metric  | Base Model | Finetuned Model | Improvement |
|---------|------------|-----------------|-------------|
| ROUGE-1 | 3.39       | 4.64           | +1.26       |
| ROUGE-2 | 0.26       | 0.41           | +0.14       |
| ROUGE-L | 3.38       | 4.63           | +1.25      |

### Model Prediction Comparison using Bigger model for evaluation

| Category           | Count | Percentage |
|-------------------|-------|------------|
| Total samples     | 5962  | 100%       |
| Same predictions  | 3     | 0.05%      |
| Better predictions| 4610  | 77.32%     |
| Worse predictions | 1349  | 22.63%     |

### Evaluation Methods
- ROUGE scores for headline similarity
- Bigger model's custom metrics for headline appropriateness and relativeness 


## Inference

#### Running the model on a GPU using different precisions

* _Using `torch.float16`_

```python
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", device_map="auto", revision="float16")

input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```

* _Using `torch.bfloat16`_

```python
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", device_map="auto", torch_dtype=torch.bfloat16)

input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```

#### Quantized Versions through `bitsandbytes`

* _Using 8-bit precision (int8)_

```python
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", quantization_config=quantization_config)

input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```

* _Using 4-bit precision_

```python
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_4bit=True)

tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", quantization_config=quantization_config)

input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```


#### Other optimizations

* _Flash Attention 2_

First make sure to install `flash-attn` in your environment `pip install flash-attn`

```diff
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
+   attn_implementation="flash_attention_2"
).to(0)
```

### Inputs and outputs

*   **Input:** Text string, such as a question, a prompt, or a document to be
    summarized.
*   **Output:** Generated English-language text in response to the input, such
    as an answer to a question, or a summary of a document.


## Technical Specifications

### Model Architecture and Objective
- Base architecture: Gemma-2
- Training objective: Supervised fine-tuning for headline generation
- Gradient checkpointing enabled for memory efficiency
- Optimized data loading with pinned memory

### Software
- PyTorch
- Transformers library
- TRL for supervised fine-tuning
- CUDA for GPU acceleration

## Uses

### Direct Use
This model is designed for generating Telugu news headlines from article content. It can be used by:
- News organizations for automated headline generation
- Content creators working with Telugu news content
- Researchers studying Telugu natural language generation

### Out-of-Scope Use
- The model should not be used for generating fake news or misleading headlines
- Not suitable for non-Telugu content
- Not designed for general text generation tasks
- Should not be used for classification or other non-headline generation tasks

## Bias, Risks, and Limitations
- May reflect biases present in Telugu news media
- Performance may vary based on news domain and writing style
- Limited to the vocabulary and patterns present in the training data
- May occasionally generate grammatically incorrect Telugu text
- Could potentially generate sensationalized headlines

### Recommendations
- Use with human oversight for published content
- Verify generated headlines for accuracy
- Monitor output for potential biases
- Implement content filtering for inappropriate generations