saidines12 commited on
Commit
3d81984
·
verified ·
1 Parent(s): 1badba0

Update README.md

Browse files

Added 2b-it training content

Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -2,14 +2,14 @@
2
  datasets:
3
  - saidines12/telugu_news_dataset
4
  base_model:
5
- - google/gemma-2b
6
  ---
7
 
8
 
9
 
10
  # Model Card for Gemma-2B Telugu News Headline Generator
11
 
12
- This model is a fine-tuned version of Google's Gemma-2B model, optimized for generating Telugu news headlines from article content. It has been trained using Supervised Fine-Tuning (SFT) on a Telugu news dataset.
13
 
14
  ## Model Details
15
 
@@ -23,7 +23,7 @@ This model is a fine-tuned version of Google's Gemma-2B model, optimized for gen
23
 
24
  ### Model Sources
25
  - **Repository:** Hugging Face Hub
26
- - **Base Model:** google/gemma-2b
27
 
28
  ## Uses
29
 
@@ -77,13 +77,13 @@ headline = tokenizer.decode(outputs[0], skip_special_tokens=True)
77
 
78
  #### Training Hyperparameters
79
  - **Training regime:** FP16 mixed precision
80
- - **Batch size:** 4 per device
81
  - **Gradient accumulation steps:** 4
82
  - **Learning rate:** 2e-4
83
- - **Maximum steps:** 30,000
84
  - **Warmup steps:** 25
85
  - **Optimizer:** AdamW
86
- - **Evaluation strategy:** Every 30000 steps
87
 
88
  #### Hardware Specifications
89
  - GPU training with gradient checkpointing
@@ -100,7 +100,7 @@ headline = tokenizer.decode(outputs[0], skip_special_tokens=True)
100
  ## Technical Specifications
101
 
102
  ### Model Architecture and Objective
103
- - Base architecture: Gemma-2B
104
  - Training objective: Supervised fine-tuning for headline generation
105
  - Gradient checkpointing enabled for memory efficiency
106
  - Optimized data loading with pinned memory
 
2
  datasets:
3
  - saidines12/telugu_news_dataset
4
  base_model:
5
+ - google/gemma-2b-it
6
  ---
7
 
8
 
9
 
10
  # Model Card for Gemma-2B Telugu News Headline Generator
11
 
12
+ This model is a fine-tuned version of Google's Gemma-2B Instruction model, optimized for generating Telugu news headlines from article content. It has been trained using Supervised Fine-Tuning (SFT) on a Telugu news dataset.
13
 
14
  ## Model Details
15
 
 
23
 
24
  ### Model Sources
25
  - **Repository:** Hugging Face Hub
26
+ - **Base Model:** google/gemma-2b-it
27
 
28
  ## Uses
29
 
 
77
 
78
  #### Training Hyperparameters
79
  - **Training regime:** FP16 mixed precision
80
+ - **Batch size:** 6 per device
81
  - **Gradient accumulation steps:** 4
82
  - **Learning rate:** 2e-4
83
+ - **Maximum steps:** 20,000
84
  - **Warmup steps:** 25
85
  - **Optimizer:** AdamW
86
+ - **Evaluation strategy:** Every 20000 steps
87
 
88
  #### Hardware Specifications
89
  - GPU training with gradient checkpointing
 
100
  ## Technical Specifications
101
 
102
  ### Model Architecture and Objective
103
+ - Base architecture: Gemma-2
104
  - Training objective: Supervised fine-tuning for headline generation
105
  - Gradient checkpointing enabled for memory efficiency
106
  - Optimized data loading with pinned memory