collapse_gemma-2-27b_hs2_accumulate_iter4_sftsd1
This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.9369
- Num Input Tokens Seen: 17420796
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.1282 | 0 |
2.6692 | 0.0144 | 5 | 1.0838 | 251232 |
2.6572 | 0.0288 | 10 | 1.0019 | 504724 |
2.4487 | 0.0432 | 15 | 0.9898 | 759272 |
2.3685 | 0.0576 | 20 | 0.9814 | 1007344 |
2.2607 | 0.0721 | 25 | 0.9869 | 1261444 |
2.1841 | 0.0865 | 30 | 0.9878 | 1511004 |
1.9613 | 0.1009 | 35 | 0.9908 | 1763396 |
1.9138 | 0.1153 | 40 | 0.9865 | 2017584 |
1.7242 | 0.1297 | 45 | 0.9835 | 2271904 |
1.56 | 0.1441 | 50 | 0.9825 | 2528656 |
1.5102 | 0.1585 | 55 | 0.9806 | 2772068 |
1.4168 | 0.1729 | 60 | 0.9775 | 3023420 |
1.4362 | 0.1874 | 65 | 0.9754 | 3276920 |
1.3918 | 0.2018 | 70 | 0.9761 | 3531492 |
1.5127 | 0.2162 | 75 | 0.9706 | 3784992 |
1.3944 | 0.2306 | 80 | 0.9733 | 4032436 |
1.1925 | 0.2450 | 85 | 0.9723 | 4273560 |
1.183 | 0.2594 | 90 | 0.9640 | 4520508 |
1.2304 | 0.2738 | 95 | 0.9646 | 4770368 |
1.0872 | 0.2882 | 100 | 0.9648 | 5020016 |
1.1574 | 0.3026 | 105 | 0.9607 | 5276716 |
1.1035 | 0.3171 | 110 | 0.9611 | 5521372 |
1.0914 | 0.3315 | 115 | 0.9585 | 5776324 |
0.9998 | 0.3459 | 120 | 0.9598 | 6022272 |
0.9534 | 0.3603 | 125 | 0.9555 | 6260392 |
1.0917 | 0.3747 | 130 | 0.9535 | 6521380 |
1.1094 | 0.3891 | 135 | 0.9535 | 6769228 |
1.1871 | 0.4035 | 140 | 0.9526 | 7024704 |
0.9796 | 0.4179 | 145 | 0.9514 | 7273240 |
1.0659 | 0.4324 | 150 | 0.9495 | 7525180 |
1.1488 | 0.4468 | 155 | 0.9484 | 7775292 |
0.9887 | 0.4612 | 160 | 0.9497 | 8016808 |
1.1045 | 0.4756 | 165 | 0.9451 | 8266100 |
1.0371 | 0.4900 | 170 | 0.9465 | 8514128 |
1.0966 | 0.5044 | 175 | 0.9450 | 8763440 |
1.0408 | 0.5188 | 180 | 0.9460 | 9017676 |
1.0891 | 0.5332 | 185 | 0.9435 | 9265972 |
1.0561 | 0.5476 | 190 | 0.9450 | 9522024 |
0.9537 | 0.5621 | 195 | 0.9434 | 9764580 |
0.9373 | 0.5765 | 200 | 0.9431 | 10016796 |
1.1323 | 0.5909 | 205 | 0.9423 | 10269756 |
1.2019 | 0.6053 | 210 | 0.9438 | 10520656 |
0.9699 | 0.6197 | 215 | 0.9416 | 10771848 |
0.9654 | 0.6341 | 220 | 0.9426 | 11022436 |
0.9461 | 0.6485 | 225 | 0.9405 | 11274272 |
0.9865 | 0.6629 | 230 | 0.9414 | 11531652 |
0.9315 | 0.6774 | 235 | 0.9391 | 11784148 |
0.9826 | 0.6918 | 240 | 0.9406 | 12037420 |
0.984 | 0.7062 | 245 | 0.9396 | 12295780 |
1.1796 | 0.7206 | 250 | 0.9419 | 12550852 |
1.0881 | 0.7350 | 255 | 0.9367 | 12796424 |
0.8628 | 0.7494 | 260 | 0.9386 | 13048276 |
1.094 | 0.7638 | 265 | 0.9372 | 13302068 |
1.0862 | 0.7782 | 270 | 0.9385 | 13552976 |
1.0226 | 0.7926 | 275 | 0.9375 | 13805560 |
0.9964 | 0.8071 | 280 | 0.9359 | 14063732 |
1.0379 | 0.8215 | 285 | 0.9368 | 14323416 |
0.7735 | 0.8359 | 290 | 0.9365 | 14578864 |
0.8855 | 0.8503 | 295 | 0.9354 | 14831324 |
0.9687 | 0.8647 | 300 | 0.9368 | 15079640 |
1.0087 | 0.8791 | 305 | 0.9351 | 15336076 |
0.8832 | 0.8935 | 310 | 0.9368 | 15598480 |
0.9207 | 0.9079 | 315 | 0.9353 | 15852360 |
0.9436 | 0.9224 | 320 | 0.9372 | 16105580 |
1.0136 | 0.9368 | 325 | 0.9360 | 16360756 |
0.9331 | 0.9512 | 330 | 0.9334 | 16610568 |
0.8251 | 0.9656 | 335 | 0.9353 | 16866280 |
0.8415 | 0.9800 | 340 | 0.9334 | 17114340 |
1.0314 | 0.9944 | 345 | 0.9360 | 17367496 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 3
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no library tag.
Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter4_sftsd1
Base model
google/gemma-2-27b