--- base_model: - google/gemma-2-2b-it datasets: - saidines12/telugu_news_dataset --- # Model Card for Gemma-2-2B-it Telugu News Headline Generator This model is a fine-tuned version of Google's Gemma-2-2B Instruction model, optimized for generating Telugu news headlines from article content. It has been trained using Supervised Fine-Tuning (SFT) on a Telugu news dataset. ## Model Details ### Model Description - **Developed by:** Google (base model) with Telugu news fine-tuning - **Model type:** Decoder-only transformer language model - **Language(s):** Telugu - **License:** Apache 2.0 - **Finetuned from model:** Gemma-2-2B ### Model Sources - **Repository:** Hugging Face Hub - **Base Model:** google/gemma-2-2b-it ## How to Get Started with the Model ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation") tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation") text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n " inputs = tokenizer(text, return_tensors="pt") outputs = model.generate(**inputs) headline = tokenizer.decode(outputs[0], skip_special_tokens=True) ``` ## Training Details ### Training Data - Telugu news articles and headlines dataset - Data cleaned and preprocessed for headline generation task - Articles spanning various news categories ### Training Procedure #### Training Hyperparameters - **Training regime:** FP16 mixed precision - **Batch size:** 6 per device - **Gradient accumulation steps:** 4 - **Learning rate:** 2e-4 - **Maximum steps:** 20,000 - **Warmup steps:** 25 - **Optimizer:** AdamW - **Evaluation strategy:** Every 20000 steps #### Hardware Specifications - GPU training with gradient checkpointing - Parallel data loading with 8 workers I'll help you add the evaluation information to your markdown file in a clearer tabular format. Here's how you can structure the evaluation section: ## Evaluation ### ROUGE Score Comparison | Metric | Base Model | Finetuned Model | Improvement | |---------|------------|-----------------|-------------| | ROUGE-1 | 3.39 | 4.64 | +1.26 | | ROUGE-2 | 0.26 | 0.41 | +0.14 | | ROUGE-L | 3.38 | 4.63 | +1.25 | ### Model Prediction Comparison using Bigger model for evaluation | Category | Count | Percentage | |-------------------|-------|------------| | Total samples | 5962 | 100% | | Same predictions | 3 | 0.05% | | Better predictions| 4610 | 77.32% | | Worse predictions | 1349 | 22.63% | ### Evaluation Methods - ROUGE scores for headline similarity - Bigger model's custom metrics for headline appropriateness and relativeness ## Inference #### Running the model on a GPU using different precisions * _Using `torch.float16`_ ```python # pip install accelerate from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation") model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", device_map="auto", revision="float16") input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n " input_ids = tokenizer(input_text, return_tensors="pt").to("cuda") outputs = model.generate(**input_ids) print(tokenizer.decode(outputs[0])) ``` * _Using `torch.bfloat16`_ ```python # pip install accelerate from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation") model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", device_map="auto", torch_dtype=torch.bfloat16) input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n " input_ids = tokenizer(input_text, return_tensors="pt").to("cuda") outputs = model.generate(**input_ids) print(tokenizer.decode(outputs[0])) ``` #### Quantized Versions through `bitsandbytes` * _Using 8-bit precision (int8)_ ```python # pip install bitsandbytes accelerate from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig quantization_config = BitsAndBytesConfig(load_in_8bit=True) tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation") model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", quantization_config=quantization_config) input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n " input_ids = tokenizer(input_text, return_tensors="pt").to("cuda") outputs = model.generate(**input_ids) print(tokenizer.decode(outputs[0])) ``` * _Using 4-bit precision_ ```python # pip install bitsandbytes accelerate from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig quantization_config = BitsAndBytesConfig(load_in_4bit=True) tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation") model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", quantization_config=quantization_config) input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n " input_ids = tokenizer(input_text, return_tensors="pt").to("cuda") outputs = model.generate(**input_ids) print(tokenizer.decode(outputs[0])) ``` #### Other optimizations * _Flash Attention 2_ First make sure to install `flash-attn` in your environment `pip install flash-attn` ```diff model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.float16, + attn_implementation="flash_attention_2" ).to(0) ``` ### Inputs and outputs * **Input:** Text string, such as a question, a prompt, or a document to be summarized. * **Output:** Generated English-language text in response to the input, such as an answer to a question, or a summary of a document. ## Technical Specifications ### Model Architecture and Objective - Base architecture: Gemma-2 - Training objective: Supervised fine-tuning for headline generation - Gradient checkpointing enabled for memory efficiency - Optimized data loading with pinned memory ### Software - PyTorch - Transformers library - TRL for supervised fine-tuning - CUDA for GPU acceleration ## Uses ### Direct Use This model is designed for generating Telugu news headlines from article content. It can be used by: - News organizations for automated headline generation - Content creators working with Telugu news content - Researchers studying Telugu natural language generation ### Out-of-Scope Use - The model should not be used for generating fake news or misleading headlines - Not suitable for non-Telugu content - Not designed for general text generation tasks - Should not be used for classification or other non-headline generation tasks ## Bias, Risks, and Limitations - May reflect biases present in Telugu news media - Performance may vary based on news domain and writing style - Limited to the vocabulary and patterns present in the training data - May occasionally generate grammatically incorrect Telugu text - Could potentially generate sensationalized headlines ### Recommendations - Use with human oversight for published content - Verify generated headlines for accuracy - Monitor output for potential biases - Implement content filtering for inappropriate generations