collapse_gemma-2-2b_hs2_accumulatesubsample_iter20_sftsd2
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.2131
- Num Input Tokens Seen: 4981168
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.3101 | 0.0528 | 5 | 1.2785 | 259360 |
1.0508 | 0.1057 | 10 | 1.2293 | 530168 |
0.9566 | 0.1585 | 15 | 1.2129 | 794520 |
0.8309 | 0.2114 | 20 | 1.2625 | 1058200 |
0.8029 | 0.2642 | 25 | 1.2461 | 1327328 |
0.6388 | 0.3170 | 30 | 1.2820 | 1587968 |
0.5711 | 0.3699 | 35 | 1.2793 | 1853352 |
0.5408 | 0.4227 | 40 | 1.2597 | 2118856 |
0.5223 | 0.4756 | 45 | 1.2438 | 2384304 |
0.4692 | 0.5284 | 50 | 1.2533 | 2651296 |
0.5336 | 0.5812 | 55 | 1.2343 | 2907944 |
0.4685 | 0.6341 | 60 | 1.2426 | 3175608 |
0.4822 | 0.6869 | 65 | 1.2253 | 3449616 |
0.4543 | 0.7398 | 70 | 1.2388 | 3719976 |
0.4193 | 0.7926 | 75 | 1.2284 | 3987200 |
0.4007 | 0.8454 | 80 | 1.2234 | 4247992 |
0.3711 | 0.8983 | 85 | 1.2196 | 4508112 |
0.4195 | 0.9511 | 90 | 1.2262 | 4769520 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 5
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter20_sftsd2
Base model
google/gemma-2-2b