saidines12 commited on
Commit
9a3700c
·
verified ·
1 Parent(s): 3e96fe6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +112 -6
README.md CHANGED
@@ -1,6 +1,112 @@
1
- ---
2
- license: apache-2.0
3
- tags:
4
- - trl
5
- - sft
6
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - trl
5
+ - sft
6
+ - telugu
7
+ ---
8
+
9
+
10
+ # Model Card for Gemma-2B Telugu News Headline Generator
11
+
12
+ This model is a fine-tuned version of Google's Gemma-2B model, optimized for generating Telugu news headlines from article content. It has been trained using Supervised Fine-Tuning (SFT) on a Telugu news dataset.
13
+
14
+ ## Model Details
15
+
16
+ ### Model Description
17
+
18
+ - **Developed by:** Google (base model) with Telugu news fine-tuning
19
+ - **Model type:** Decoder-only transformer language model
20
+ - **Language(s):** Telugu
21
+ - **License:** Apache 2.0
22
+ - **Finetuned from model:** Gemma-2B
23
+
24
+ ### Model Sources
25
+ - **Repository:** Hugging Face Hub
26
+ - **Base Model:** google/gemma-2b
27
+
28
+ ## Uses
29
+
30
+ ### Direct Use
31
+ This model is designed for generating Telugu news headlines from article content. It can be used by:
32
+ - News organizations for automated headline generation
33
+ - Content creators working with Telugu news content
34
+ - Researchers studying Telugu natural language generation
35
+
36
+ ### Out-of-Scope Use
37
+ - The model should not be used for generating fake news or misleading headlines
38
+ - Not suitable for non-Telugu content
39
+ - Not designed for general text generation tasks
40
+ - Should not be used for classification or other non-headline generation tasks
41
+
42
+ ## Bias, Risks, and Limitations
43
+ - May reflect biases present in Telugu news media
44
+ - Performance may vary based on news domain and writing style
45
+ - Limited to the vocabulary and patterns present in the training data
46
+ - May occasionally generate grammatically incorrect Telugu text
47
+ - Could potentially generate sensationalized headlines
48
+
49
+ ### Recommendations
50
+ - Use with human oversight for published content
51
+ - Verify generated headlines for accuracy
52
+ - Monitor output for potential biases
53
+ - Implement content filtering for inappropriate generations
54
+
55
+ ## How to Get Started with the Model
56
+
57
+ ```python
58
+ from transformers import AutoModelForCausalLM, AutoTokenizer
59
+
60
+ model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation")
61
+ tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
62
+
63
+ text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
64
+ inputs = tokenizer(text, return_tensors="pt")
65
+ outputs = model.generate(**inputs)
66
+ headline = tokenizer.decode(outputs[0], skip_special_tokens=True)
67
+ ```
68
+
69
+ ## Training Details
70
+
71
+ ### Training Data
72
+ - Telugu news articles and headlines dataset
73
+ - Data cleaned and preprocessed for headline generation task
74
+ - Articles spanning various news categories
75
+
76
+ ### Training Procedure
77
+
78
+ #### Training Hyperparameters
79
+ - **Training regime:** FP16 mixed precision
80
+ - **Batch size:** 4 per device
81
+ - **Gradient accumulation steps:** 4
82
+ - **Learning rate:** 2e-4
83
+ - **Maximum steps:** 30,000
84
+ - **Warmup steps:** 25
85
+ - **Optimizer:** AdamW
86
+ - **Evaluation strategy:** Every 30000 steps
87
+
88
+ #### Hardware Specifications
89
+ - GPU training with gradient checkpointing
90
+ - Parallel data loading with 8 workers
91
+
92
+ ## Evaluation
93
+
94
+ ### Metrics
95
+ - ROUGE scores for headline similarity
96
+ - Human evaluation for headline appropriateness
97
+
98
+
99
+
100
+ ## Technical Specifications
101
+
102
+ ### Model Architecture and Objective
103
+ - Base architecture: Gemma-2B
104
+ - Training objective: Supervised fine-tuning for headline generation
105
+ - Gradient checkpointing enabled for memory efficiency
106
+ - Optimized data loading with pinned memory
107
+
108
+ ### Software
109
+ - PyTorch
110
+ - Transformers library
111
+ - TRL for supervised fine-tuning
112
+ - CUDA for GPU acceleration