collapse_gemma-2-27b_hs2_accumulate_iter4_sftsd1

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9369
  • Num Input Tokens Seen: 17420796

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.1282 0
2.6692 0.0144 5 1.0838 251232
2.6572 0.0288 10 1.0019 504724
2.4487 0.0432 15 0.9898 759272
2.3685 0.0576 20 0.9814 1007344
2.2607 0.0721 25 0.9869 1261444
2.1841 0.0865 30 0.9878 1511004
1.9613 0.1009 35 0.9908 1763396
1.9138 0.1153 40 0.9865 2017584
1.7242 0.1297 45 0.9835 2271904
1.56 0.1441 50 0.9825 2528656
1.5102 0.1585 55 0.9806 2772068
1.4168 0.1729 60 0.9775 3023420
1.4362 0.1874 65 0.9754 3276920
1.3918 0.2018 70 0.9761 3531492
1.5127 0.2162 75 0.9706 3784992
1.3944 0.2306 80 0.9733 4032436
1.1925 0.2450 85 0.9723 4273560
1.183 0.2594 90 0.9640 4520508
1.2304 0.2738 95 0.9646 4770368
1.0872 0.2882 100 0.9648 5020016
1.1574 0.3026 105 0.9607 5276716
1.1035 0.3171 110 0.9611 5521372
1.0914 0.3315 115 0.9585 5776324
0.9998 0.3459 120 0.9598 6022272
0.9534 0.3603 125 0.9555 6260392
1.0917 0.3747 130 0.9535 6521380
1.1094 0.3891 135 0.9535 6769228
1.1871 0.4035 140 0.9526 7024704
0.9796 0.4179 145 0.9514 7273240
1.0659 0.4324 150 0.9495 7525180
1.1488 0.4468 155 0.9484 7775292
0.9887 0.4612 160 0.9497 8016808
1.1045 0.4756 165 0.9451 8266100
1.0371 0.4900 170 0.9465 8514128
1.0966 0.5044 175 0.9450 8763440
1.0408 0.5188 180 0.9460 9017676
1.0891 0.5332 185 0.9435 9265972
1.0561 0.5476 190 0.9450 9522024
0.9537 0.5621 195 0.9434 9764580
0.9373 0.5765 200 0.9431 10016796
1.1323 0.5909 205 0.9423 10269756
1.2019 0.6053 210 0.9438 10520656
0.9699 0.6197 215 0.9416 10771848
0.9654 0.6341 220 0.9426 11022436
0.9461 0.6485 225 0.9405 11274272
0.9865 0.6629 230 0.9414 11531652
0.9315 0.6774 235 0.9391 11784148
0.9826 0.6918 240 0.9406 12037420
0.984 0.7062 245 0.9396 12295780
1.1796 0.7206 250 0.9419 12550852
1.0881 0.7350 255 0.9367 12796424
0.8628 0.7494 260 0.9386 13048276
1.094 0.7638 265 0.9372 13302068
1.0862 0.7782 270 0.9385 13552976
1.0226 0.7926 275 0.9375 13805560
0.9964 0.8071 280 0.9359 14063732
1.0379 0.8215 285 0.9368 14323416
0.7735 0.8359 290 0.9365 14578864
0.8855 0.8503 295 0.9354 14831324
0.9687 0.8647 300 0.9368 15079640
1.0087 0.8791 305 0.9351 15336076
0.8832 0.8935 310 0.9368 15598480
0.9207 0.9079 315 0.9353 15852360
0.9436 0.9224 320 0.9372 16105580
1.0136 0.9368 325 0.9360 16360756
0.9331 0.9512 330 0.9334 16610568
0.8251 0.9656 335 0.9353 16866280
0.8415 0.9800 340 0.9334 17114340
1.0314 0.9944 345 0.9360 17367496

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
27.2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter4_sftsd1

Base model

google/gemma-2-27b
Finetuned
(52)
this model