Update README.md
Browse filesAdded 2b-it training content
README.md
CHANGED
@@ -2,14 +2,14 @@
|
|
2 |
datasets:
|
3 |
- saidines12/telugu_news_dataset
|
4 |
base_model:
|
5 |
-
- google/gemma-2b
|
6 |
---
|
7 |
|
8 |
|
9 |
|
10 |
# Model Card for Gemma-2B Telugu News Headline Generator
|
11 |
|
12 |
-
This model is a fine-tuned version of Google's Gemma-2B model, optimized for generating Telugu news headlines from article content. It has been trained using Supervised Fine-Tuning (SFT) on a Telugu news dataset.
|
13 |
|
14 |
## Model Details
|
15 |
|
@@ -23,7 +23,7 @@ This model is a fine-tuned version of Google's Gemma-2B model, optimized for gen
|
|
23 |
|
24 |
### Model Sources
|
25 |
- **Repository:** Hugging Face Hub
|
26 |
-
- **Base Model:** google/gemma-2b
|
27 |
|
28 |
## Uses
|
29 |
|
@@ -77,13 +77,13 @@ headline = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
|
77 |
|
78 |
#### Training Hyperparameters
|
79 |
- **Training regime:** FP16 mixed precision
|
80 |
-
- **Batch size:**
|
81 |
- **Gradient accumulation steps:** 4
|
82 |
- **Learning rate:** 2e-4
|
83 |
-
- **Maximum steps:**
|
84 |
- **Warmup steps:** 25
|
85 |
- **Optimizer:** AdamW
|
86 |
-
- **Evaluation strategy:** Every
|
87 |
|
88 |
#### Hardware Specifications
|
89 |
- GPU training with gradient checkpointing
|
@@ -100,7 +100,7 @@ headline = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
|
100 |
## Technical Specifications
|
101 |
|
102 |
### Model Architecture and Objective
|
103 |
-
- Base architecture: Gemma-
|
104 |
- Training objective: Supervised fine-tuning for headline generation
|
105 |
- Gradient checkpointing enabled for memory efficiency
|
106 |
- Optimized data loading with pinned memory
|
|
|
2 |
datasets:
|
3 |
- saidines12/telugu_news_dataset
|
4 |
base_model:
|
5 |
+
- google/gemma-2b-it
|
6 |
---
|
7 |
|
8 |
|
9 |
|
10 |
# Model Card for Gemma-2B Telugu News Headline Generator
|
11 |
|
12 |
+
This model is a fine-tuned version of Google's Gemma-2B Instruction model, optimized for generating Telugu news headlines from article content. It has been trained using Supervised Fine-Tuning (SFT) on a Telugu news dataset.
|
13 |
|
14 |
## Model Details
|
15 |
|
|
|
23 |
|
24 |
### Model Sources
|
25 |
- **Repository:** Hugging Face Hub
|
26 |
+
- **Base Model:** google/gemma-2b-it
|
27 |
|
28 |
## Uses
|
29 |
|
|
|
77 |
|
78 |
#### Training Hyperparameters
|
79 |
- **Training regime:** FP16 mixed precision
|
80 |
+
- **Batch size:** 6 per device
|
81 |
- **Gradient accumulation steps:** 4
|
82 |
- **Learning rate:** 2e-4
|
83 |
+
- **Maximum steps:** 20,000
|
84 |
- **Warmup steps:** 25
|
85 |
- **Optimizer:** AdamW
|
86 |
+
- **Evaluation strategy:** Every 20000 steps
|
87 |
|
88 |
#### Hardware Specifications
|
89 |
- GPU training with gradient checkpointing
|
|
|
100 |
## Technical Specifications
|
101 |
|
102 |
### Model Architecture and Objective
|
103 |
+
- Base architecture: Gemma-2
|
104 |
- Training objective: Supervised fine-tuning for headline generation
|
105 |
- Gradient checkpointing enabled for memory efficiency
|
106 |
- Optimized data loading with pinned memory
|