|
--- |
|
base_model: |
|
- google/gemma-2-2b-it |
|
datasets: |
|
- saidines12/telugu_news_dataset |
|
--- |
|
|
|
|
|
|
|
# Model Card for Gemma-2-2B-it Telugu News Headline Generator |
|
|
|
This model is a fine-tuned version of Google's Gemma-2-2B Instruction model, optimized for generating Telugu news headlines from article content. It has been trained using Supervised Fine-Tuning (SFT) on a Telugu news dataset. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
- **Developed by:** Google (base model) with Telugu news fine-tuning |
|
- **Model type:** Decoder-only transformer language model |
|
- **Language(s):** Telugu |
|
- **License:** Apache 2.0 |
|
- **Finetuned from model:** Gemma-2-2B |
|
|
|
### Model Sources |
|
- **Repository:** Hugging Face Hub |
|
- **Base Model:** google/gemma-2-2b-it |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation") |
|
tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation") |
|
|
|
text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>" |
|
inputs = tokenizer(text, return_tensors="pt") |
|
outputs = model.generate(**inputs) |
|
headline = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
- Telugu news articles and headlines dataset |
|
- Data cleaned and preprocessed for headline generation task |
|
- Articles spanning various news categories |
|
|
|
### Training Procedure |
|
|
|
#### Training Hyperparameters |
|
- **Training regime:** FP16 mixed precision |
|
- **Batch size:** 6 per device |
|
- **Gradient accumulation steps:** 4 |
|
- **Learning rate:** 2e-4 |
|
- **Maximum steps:** 20,000 |
|
- **Warmup steps:** 25 |
|
- **Optimizer:** AdamW |
|
- **Evaluation strategy:** Every 20000 steps |
|
|
|
#### Hardware Specifications |
|
- GPU training with gradient checkpointing |
|
- Parallel data loading with 8 workers |
|
|
|
I'll help you add the evaluation information to your markdown file in a clearer tabular format. |
|
|
|
Here's how you can structure the evaluation section: |
|
|
|
## Evaluation |
|
|
|
### ROUGE Score Comparison |
|
|
|
| Metric | Base Model | Finetuned Model | Improvement | |
|
|---------|------------|-----------------|-------------| |
|
| ROUGE-1 | 3.39 | 4.64 | +1.26 | |
|
| ROUGE-2 | 0.26 | 0.41 | +0.14 | |
|
| ROUGE-L | 3.38 | 4.63 | +1.25 | |
|
|
|
### Model Prediction Comparison using Bigger model for evaluation |
|
|
|
| Category | Count | Percentage | |
|
|-------------------|-------|------------| |
|
| Total samples | 5962 | 100% | |
|
| Same predictions | 3 | 0.05% | |
|
| Better predictions| 4610 | 77.32% | |
|
| Worse predictions | 1349 | 22.63% | |
|
|
|
### Evaluation Methods |
|
- ROUGE scores for headline similarity |
|
- Bigger model's custom metrics for headline appropriateness and relativeness |
|
|
|
|
|
## Inference |
|
|
|
#### Running the model on a GPU using different precisions |
|
|
|
* _Using `torch.float16`_ |
|
|
|
```python |
|
# pip install accelerate |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation") |
|
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", device_map="auto", revision="float16") |
|
|
|
input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>" |
|
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda") |
|
|
|
outputs = model.generate(**input_ids) |
|
print(tokenizer.decode(outputs[0])) |
|
``` |
|
|
|
* _Using `torch.bfloat16`_ |
|
|
|
```python |
|
# pip install accelerate |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation") |
|
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", device_map="auto", torch_dtype=torch.bfloat16) |
|
|
|
input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>" |
|
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda") |
|
|
|
outputs = model.generate(**input_ids) |
|
print(tokenizer.decode(outputs[0])) |
|
``` |
|
|
|
#### Quantized Versions through `bitsandbytes` |
|
|
|
* _Using 8-bit precision (int8)_ |
|
|
|
```python |
|
# pip install bitsandbytes accelerate |
|
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig |
|
|
|
quantization_config = BitsAndBytesConfig(load_in_8bit=True) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation") |
|
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", quantization_config=quantization_config) |
|
|
|
input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>" |
|
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda") |
|
|
|
outputs = model.generate(**input_ids) |
|
print(tokenizer.decode(outputs[0])) |
|
``` |
|
|
|
* _Using 4-bit precision_ |
|
|
|
```python |
|
# pip install bitsandbytes accelerate |
|
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig |
|
|
|
quantization_config = BitsAndBytesConfig(load_in_4bit=True) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation") |
|
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", quantization_config=quantization_config) |
|
|
|
input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>" |
|
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda") |
|
|
|
outputs = model.generate(**input_ids) |
|
print(tokenizer.decode(outputs[0])) |
|
``` |
|
|
|
|
|
#### Other optimizations |
|
|
|
* _Flash Attention 2_ |
|
|
|
First make sure to install `flash-attn` in your environment `pip install flash-attn` |
|
|
|
```diff |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_id, |
|
torch_dtype=torch.float16, |
|
+ attn_implementation="flash_attention_2" |
|
).to(0) |
|
``` |
|
|
|
### Inputs and outputs |
|
|
|
* **Input:** Text string, such as a question, a prompt, or a document to be |
|
summarized. |
|
* **Output:** Generated English-language text in response to the input, such |
|
as an answer to a question, or a summary of a document. |
|
|
|
|
|
|
|
## Technical Specifications |
|
|
|
### Model Architecture and Objective |
|
- Base architecture: Gemma-2 |
|
- Training objective: Supervised fine-tuning for headline generation |
|
- Gradient checkpointing enabled for memory efficiency |
|
- Optimized data loading with pinned memory |
|
|
|
### Software |
|
- PyTorch |
|
- Transformers library |
|
- TRL for supervised fine-tuning |
|
- CUDA for GPU acceleration |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
This model is designed for generating Telugu news headlines from article content. It can be used by: |
|
- News organizations for automated headline generation |
|
- Content creators working with Telugu news content |
|
- Researchers studying Telugu natural language generation |
|
|
|
### Out-of-Scope Use |
|
- The model should not be used for generating fake news or misleading headlines |
|
- Not suitable for non-Telugu content |
|
- Not designed for general text generation tasks |
|
- Should not be used for classification or other non-headline generation tasks |
|
|
|
## Bias, Risks, and Limitations |
|
- May reflect biases present in Telugu news media |
|
- Performance may vary based on news domain and writing style |
|
- Limited to the vocabulary and patterns present in the training data |
|
- May occasionally generate grammatically incorrect Telugu text |
|
- Could potentially generate sensationalized headlines |
|
|
|
### Recommendations |
|
- Use with human oversight for published content |
|
- Verify generated headlines for accuracy |
|
- Monitor output for potential biases |
|
- Implement content filtering for inappropriate generations |