collapse_gemma-2-27b_hs2_accumulate_iter4_sftsd0

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9428
  • Num Input Tokens Seen: 17441212

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.1282 0
2.7026 0.0142 5 1.0845 250572
2.7951 0.0285 10 1.0059 495468
2.4024 0.0427 15 0.9903 740552
2.3813 0.0570 20 0.9834 983128
2.0909 0.0712 25 0.9822 1231348
2.0832 0.0855 30 0.9854 1485220
2.2061 0.0997 35 0.9889 1739976
1.8943 0.1140 40 0.9868 1994708
1.8237 0.1282 45 0.9788 2248268
1.6061 0.1425 50 0.9833 2491904
1.645 0.1567 55 0.9800 2745000
1.5498 0.1710 60 0.9792 2988148
1.2707 0.1852 65 0.9792 3237400
1.2508 0.1995 70 0.9746 3494232
1.2433 0.2137 75 0.9708 3747944
1.1545 0.2280 80 0.9691 3990240
1.3564 0.2422 85 0.9691 4234336
1.1692 0.2565 90 0.9681 4481372
1.1797 0.2707 95 0.9646 4733204
1.1292 0.2850 100 0.9630 4979876
1.034 0.2992 105 0.9641 5219284
1.0656 0.3135 110 0.9605 5467328
1.0678 0.3277 115 0.9588 5723652
1.0246 0.3420 120 0.9581 5975880
1.1025 0.3562 125 0.9580 6219980
1.0895 0.3705 130 0.9559 6475528
0.9828 0.3847 135 0.9546 6724216
0.9003 0.3990 140 0.9516 6971248
0.9099 0.4132 145 0.9538 7219644
0.9169 0.4275 150 0.9503 7471332
0.9124 0.4417 155 0.9517 7725516
0.9038 0.4560 160 0.9509 7974732
0.9577 0.4702 165 0.9490 8222880
1.0668 0.4845 170 0.9486 8463156
1.0556 0.4987 175 0.9484 8711816
0.958 0.5130 180 0.9446 8964120
0.7769 0.5272 185 0.9472 9212680
0.7975 0.5415 190 0.9450 9459576
0.8965 0.5557 195 0.9442 9711232
0.9835 0.5700 200 0.9461 9962788
0.9513 0.5842 205 0.9421 10215452
0.9281 0.5985 210 0.9448 10468768
0.819 0.6127 215 0.9426 10711836
0.8368 0.6269 220 0.9454 10963464
0.8332 0.6412 225 0.9419 11211872
1.1059 0.6554 230 0.9416 11468040
0.7919 0.6697 235 0.9409 11711864
0.7565 0.6839 240 0.9414 11960556
0.6964 0.6982 245 0.9424 12207416
0.92 0.7124 250 0.9419 12449244
0.7462 0.7267 255 0.9402 12696604
1.0246 0.7409 260 0.9435 12946160
0.7697 0.7552 265 0.9396 13199664
0.6771 0.7694 270 0.9407 13444784
0.7791 0.7837 275 0.9394 13700124
0.9775 0.7979 280 0.9422 13953992
0.9798 0.8122 285 0.9381 14204856
0.8106 0.8264 290 0.9395 14451212
0.8597 0.8407 295 0.9400 14702444
0.9122 0.8549 300 0.9450 14954496
0.8738 0.8692 305 0.9410 15199504
0.8448 0.8834 310 0.9375 15449288
0.7054 0.8977 315 0.9385 15692504
0.9606 0.9119 320 0.9380 15942552
1.0059 0.9262 325 0.9357 16189660
0.703 0.9404 330 0.9405 16441124
0.9094 0.9547 335 0.9358 16688128
0.8983 0.9689 340 0.9388 16938972
0.86 0.9832 345 0.9368 17187008
0.8023 0.9974 350 0.9428 17441212

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
27.2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter4_sftsd0

Base model

google/gemma-2-27b
Finetuned
(52)
this model