collapse_gemma-2-27b_hs2_accumulate_iter4_sftsd2

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9397
  • Num Input Tokens Seen: 17213660

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.1282 0
2.8683 0.0141 5 1.0844 236836
2.4917 0.0282 10 1.0063 477060
2.4805 0.0423 15 0.9925 726020
2.4109 0.0564 20 0.9841 973680
2.296 0.0705 25 0.9858 1223540
2.0121 0.0846 30 0.9919 1470300
1.7927 0.0987 35 0.9894 1718324
1.8281 0.1128 40 0.9977 1962952
1.7373 0.1268 45 0.9951 2208104
1.5941 0.1409 50 0.9881 2455412
1.6058 0.1550 55 0.9857 2696832
1.0647 0.1691 60 0.9818 2941868
1.1676 0.1832 65 0.9758 3188060
1.2806 0.1973 70 0.9758 3427696
1.0585 0.2114 75 0.9734 3672608
1.0442 0.2255 80 0.9696 3910204
1.0145 0.2396 85 0.9699 4146872
1.0364 0.2537 90 0.9652 4394008
1.0252 0.2678 95 0.9647 4635300
0.969 0.2819 100 0.9630 4879116
0.7795 0.2960 105 0.9612 5118936
0.8606 0.3101 110 0.9571 5366792
1.0389 0.3242 115 0.9581 5612876
0.8369 0.3383 120 0.9558 5861964
0.8261 0.3524 125 0.9563 6109352
0.7797 0.3665 130 0.9521 6350016
0.91 0.3805 135 0.9539 6594400
0.9656 0.3946 140 0.9528 6829540
0.8705 0.4087 145 0.9517 7073132
0.9275 0.4228 150 0.9501 7317792
0.7878 0.4369 155 0.9495 7562692
0.79 0.4510 160 0.9493 7804712
0.9756 0.4651 165 0.9486 8045908
0.831 0.4792 170 0.9501 8295248
0.7312 0.4933 175 0.9482 8539448
0.8828 0.5074 180 0.9462 8782312
0.654 0.5215 185 0.9476 9028520
0.9007 0.5356 190 0.9451 9272816
0.7856 0.5497 195 0.9463 9519724
0.6986 0.5638 200 0.9445 9769440
0.8185 0.5779 205 0.9482 10012624
0.7951 0.5920 210 0.9453 10257436
0.7885 0.6061 215 0.9442 10497084
0.8135 0.6202 220 0.9452 10726612
0.8553 0.6342 225 0.9432 10964756
0.7149 0.6483 230 0.9454 11206028
0.796 0.6624 235 0.9439 11446772
0.7876 0.6765 240 0.9443 11686044
0.7328 0.6906 245 0.9433 11936452
0.8117 0.7047 250 0.9431 12174492
0.9161 0.7188 255 0.9400 12410412
0.6793 0.7329 260 0.9424 12649736
0.7372 0.7470 265 0.9430 12887028
0.6329 0.7611 270 0.9402 13126712
0.8913 0.7752 275 0.9416 13368188
0.83 0.7893 280 0.9409 13615264
0.6657 0.8034 285 0.9400 13855436
0.9027 0.8175 290 0.9404 14102064
0.7206 0.8316 295 0.9401 14340172
0.7678 0.8457 300 0.9399 14573172
0.8187 0.8598 305 0.9401 14816224
0.6861 0.8739 310 0.9399 15065152
0.8274 0.8879 315 0.9384 15306488
0.8374 0.9020 320 0.9391 15543972
0.7515 0.9161 325 0.9370 15780660
0.8439 0.9302 330 0.9393 16027512
0.7666 0.9443 335 0.9410 16271828
0.7781 0.9584 340 0.9404 16516708
0.77 0.9725 345 0.9435 16772604
0.6227 0.9866 350 0.9362 17015180

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
27.2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter4_sftsd2

Base model

google/gemma-2-27b
Finetuned
(52)
this model