collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd2
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0989
- Num Input Tokens Seen: 13720456
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3956 | 0 |
1.5681 | 0.0206 | 5 | 1.3561 | 281208 |
1.3762 | 0.0412 | 10 | 1.2366 | 561408 |
1.2447 | 0.0618 | 15 | 1.1711 | 846520 |
1.2027 | 0.0824 | 20 | 1.1444 | 1123552 |
1.3092 | 0.1030 | 25 | 1.1172 | 1406248 |
1.1897 | 0.1236 | 30 | 1.1186 | 1693528 |
1.0685 | 0.1441 | 35 | 1.1195 | 1979856 |
0.9925 | 0.1647 | 40 | 1.1286 | 2262504 |
1.0026 | 0.1853 | 45 | 1.1277 | 2544760 |
0.9181 | 0.2059 | 50 | 1.1374 | 2825680 |
0.9007 | 0.2265 | 55 | 1.1411 | 3103440 |
0.8626 | 0.2471 | 60 | 1.1421 | 3388104 |
0.8576 | 0.2677 | 65 | 1.1406 | 3668152 |
0.9025 | 0.2883 | 70 | 1.1459 | 3947528 |
0.8566 | 0.3089 | 75 | 1.1449 | 4229392 |
0.8071 | 0.3295 | 80 | 1.1467 | 4514912 |
0.7788 | 0.3501 | 85 | 1.1398 | 4800168 |
0.7999 | 0.3707 | 90 | 1.1427 | 5085472 |
0.7548 | 0.3912 | 95 | 1.1401 | 5370096 |
0.7775 | 0.4118 | 100 | 1.1324 | 5654448 |
0.6659 | 0.4324 | 105 | 1.1390 | 5932488 |
0.7151 | 0.4530 | 110 | 1.1345 | 6217432 |
0.7126 | 0.4736 | 115 | 1.1303 | 6504472 |
0.5812 | 0.4942 | 120 | 1.1395 | 6786136 |
0.7462 | 0.5148 | 125 | 1.1331 | 7075544 |
0.6824 | 0.5354 | 130 | 1.1306 | 7349632 |
0.7777 | 0.5560 | 135 | 1.1333 | 7638056 |
0.614 | 0.5766 | 140 | 1.1285 | 7926232 |
0.6151 | 0.5972 | 145 | 1.1264 | 8206848 |
0.7309 | 0.6178 | 150 | 1.1235 | 8494256 |
0.6219 | 0.6384 | 155 | 1.1226 | 8771192 |
0.6518 | 0.6589 | 160 | 1.1194 | 9060384 |
0.6101 | 0.6795 | 165 | 1.1167 | 9344632 |
0.6374 | 0.7001 | 170 | 1.1139 | 9625824 |
0.6431 | 0.7207 | 175 | 1.1153 | 9909464 |
0.6351 | 0.7413 | 180 | 1.1112 | 10193712 |
0.6205 | 0.7619 | 185 | 1.1099 | 10473824 |
0.5593 | 0.7825 | 190 | 1.1086 | 10757760 |
0.6611 | 0.8031 | 195 | 1.1067 | 11044304 |
0.604 | 0.8237 | 200 | 1.1089 | 11335648 |
0.5985 | 0.8443 | 205 | 1.1045 | 11616672 |
0.6425 | 0.8649 | 210 | 1.1041 | 11904256 |
0.6244 | 0.8855 | 215 | 1.1036 | 12186800 |
0.4801 | 0.9060 | 220 | 1.1015 | 12472520 |
0.5418 | 0.9266 | 225 | 1.1026 | 12757120 |
0.5693 | 0.9472 | 230 | 1.0992 | 13037120 |
0.6361 | 0.9678 | 235 | 1.0997 | 13321752 |
0.5677 | 0.9884 | 240 | 1.0984 | 13608048 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 6
Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd2
Base model
google/gemma-2-2b