metadata
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd2
results: []
collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd2
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1032
- Num Input Tokens Seen: 21819352
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3956 | 0 |
1.5432 | 0.0127 | 5 | 1.3798 | 276728 |
1.4208 | 0.0254 | 10 | 1.2917 | 554256 |
1.4236 | 0.0381 | 15 | 1.2111 | 833616 |
1.3033 | 0.0508 | 20 | 1.1647 | 1109744 |
1.2167 | 0.0634 | 25 | 1.1518 | 1384696 |
1.0953 | 0.0761 | 30 | 1.1341 | 1664008 |
0.9168 | 0.0888 | 35 | 1.1461 | 1944176 |
0.9273 | 0.1015 | 40 | 1.1542 | 2218368 |
0.8943 | 0.1142 | 45 | 1.1696 | 2492552 |
0.8168 | 0.1269 | 50 | 1.1792 | 2773488 |
0.7781 | 0.1396 | 55 | 1.1739 | 3050208 |
0.8131 | 0.1523 | 60 | 1.1845 | 3326584 |
0.6973 | 0.1649 | 65 | 1.1836 | 3606104 |
0.7054 | 0.1776 | 70 | 1.1733 | 3887952 |
0.685 | 0.1903 | 75 | 1.1764 | 4170752 |
0.5768 | 0.2030 | 80 | 1.1771 | 4444816 |
0.6494 | 0.2157 | 85 | 1.1719 | 4718552 |
0.5484 | 0.2284 | 90 | 1.1698 | 4998784 |
0.5609 | 0.2411 | 95 | 1.1739 | 5274536 |
0.4343 | 0.2538 | 100 | 1.1755 | 5553760 |
0.5656 | 0.2665 | 105 | 1.1654 | 5828328 |
0.5633 | 0.2791 | 110 | 1.1696 | 6104712 |
0.4485 | 0.2918 | 115 | 1.1631 | 6380840 |
0.4853 | 0.3045 | 120 | 1.1658 | 6651752 |
0.4552 | 0.3172 | 125 | 1.1593 | 6928872 |
0.4465 | 0.3299 | 130 | 1.1584 | 7200200 |
0.4402 | 0.3426 | 135 | 1.1605 | 7481976 |
0.4228 | 0.3553 | 140 | 1.1536 | 7765000 |
0.5075 | 0.3680 | 145 | 1.1529 | 8037040 |
0.3783 | 0.3807 | 150 | 1.1505 | 8313288 |
0.4 | 0.3933 | 155 | 1.1464 | 8593584 |
0.4482 | 0.4060 | 160 | 1.1507 | 8869384 |
0.4995 | 0.4187 | 165 | 1.1418 | 9145296 |
0.4386 | 0.4314 | 170 | 1.1420 | 9423816 |
0.3944 | 0.4441 | 175 | 1.1406 | 9707024 |
0.5069 | 0.4568 | 180 | 1.1408 | 9977424 |
0.36 | 0.4695 | 185 | 1.1408 | 10247568 |
0.4558 | 0.4822 | 190 | 1.1369 | 10525312 |
0.4699 | 0.4948 | 195 | 1.1341 | 10807080 |
0.5118 | 0.5075 | 200 | 1.1346 | 11075200 |
0.5246 | 0.5202 | 205 | 1.1310 | 11355128 |
0.5085 | 0.5329 | 210 | 1.1323 | 11635976 |
0.3497 | 0.5456 | 215 | 1.1290 | 11912608 |
0.4282 | 0.5583 | 220 | 1.1304 | 12191360 |
0.3405 | 0.5710 | 225 | 1.1261 | 12468896 |
0.4814 | 0.5837 | 230 | 1.1271 | 12748408 |
0.3857 | 0.5964 | 235 | 1.1262 | 13023016 |
0.4579 | 0.6090 | 240 | 1.1245 | 13302328 |
0.4054 | 0.6217 | 245 | 1.1244 | 13575408 |
0.4019 | 0.6344 | 250 | 1.1222 | 13851880 |
0.4085 | 0.6471 | 255 | 1.1206 | 14126456 |
0.3261 | 0.6598 | 260 | 1.1226 | 14411880 |
0.3434 | 0.6725 | 265 | 1.1197 | 14693704 |
0.3898 | 0.6852 | 270 | 1.1189 | 14972552 |
0.3275 | 0.6979 | 275 | 1.1202 | 15244856 |
0.3851 | 0.7105 | 280 | 1.1181 | 15517984 |
0.3896 | 0.7232 | 285 | 1.1167 | 15793480 |
0.4382 | 0.7359 | 290 | 1.1164 | 16072136 |
0.4112 | 0.7486 | 295 | 1.1147 | 16347632 |
0.4165 | 0.7613 | 300 | 1.1153 | 16622200 |
0.3549 | 0.7740 | 305 | 1.1137 | 16896656 |
0.3859 | 0.7867 | 310 | 1.1130 | 17175712 |
0.3636 | 0.7994 | 315 | 1.1129 | 17456320 |
0.4647 | 0.8121 | 320 | 1.1109 | 17735952 |
0.3973 | 0.8247 | 325 | 1.1121 | 18011048 |
0.3857 | 0.8374 | 330 | 1.1100 | 18285984 |
0.3692 | 0.8501 | 335 | 1.1105 | 18560024 |
0.4178 | 0.8628 | 340 | 1.1092 | 18834584 |
0.3232 | 0.8755 | 345 | 1.1070 | 19113832 |
0.3482 | 0.8882 | 350 | 1.1070 | 19390200 |
0.4256 | 0.9009 | 355 | 1.1065 | 19670664 |
0.4421 | 0.9136 | 360 | 1.1040 | 19946664 |
0.4513 | 0.9262 | 365 | 1.1046 | 20229584 |
0.395 | 0.9389 | 370 | 1.1059 | 20503736 |
0.3129 | 0.9516 | 375 | 1.1033 | 20776680 |
0.3915 | 0.9643 | 380 | 1.1048 | 21053616 |
0.3239 | 0.9770 | 385 | 1.1003 | 21327312 |
0.3765 | 0.9897 | 390 | 1.1039 | 21601936 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1