collapse_gemma-2-9b_hs2_accumulate_iter4_sftsd2

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9531
  • Num Input Tokens Seen: 19303096

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.2335 0
1.4118 0.0133 5 1.1826 257860
1.2214 0.0266 10 1.0631 522104
1.0564 0.0400 15 1.0225 780584
0.7497 0.0533 20 1.0114 1036328
0.6525 0.0666 25 1.0250 1295960
0.4833 0.0799 30 1.0213 1552720
0.4335 0.0933 35 1.0174 1808712
0.3836 0.1066 40 1.0126 2061276
0.4176 0.1199 45 1.0100 2316160
0.3447 0.1332 50 1.0023 2572684
0.3402 0.1466 55 0.9984 2828056
0.3671 0.1599 60 0.9914 3084180
0.3605 0.1732 65 0.9913 3341524
0.3938 0.1865 70 0.9866 3608220
0.3298 0.1999 75 0.9840 3864084
0.3437 0.2132 80 0.9800 4125920
0.4241 0.2265 85 0.9796 4376200
0.3798 0.2398 90 0.9779 4636060
0.3598 0.2531 95 0.9747 4894472
0.401 0.2665 100 0.9730 5157976
0.3151 0.2798 105 0.9742 5414044
0.3781 0.2931 110 0.9713 5673020
0.4242 0.3064 115 0.9694 5930676
0.3515 0.3198 120 0.9692 6195360
0.2744 0.3331 125 0.9673 6452160
0.3215 0.3464 130 0.9655 6702024
0.3921 0.3597 135 0.9647 6952796
0.3987 0.3731 140 0.9633 7216020
0.3074 0.3864 145 0.9640 7474692
0.3314 0.3997 150 0.9631 7739548
0.3048 0.4130 155 0.9610 8005920
0.3229 0.4263 160 0.9626 8259444
0.2944 0.4397 165 0.9617 8514840
0.2932 0.4530 170 0.9619 8772880
0.2929 0.4663 175 0.9613 9032612
0.3491 0.4796 180 0.9602 9285936
0.3658 0.4930 185 0.9611 9541684
0.2627 0.5063 190 0.9609 9796096
0.3652 0.5196 195 0.9597 10054920
0.2474 0.5329 200 0.9593 10310456
0.3399 0.5463 205 0.9610 10566224
0.293 0.5596 210 0.9584 10821340
0.332 0.5729 215 0.9575 11080028
0.3365 0.5862 220 0.9576 11339624
0.3079 0.5996 225 0.9569 11596368
0.3383 0.6129 230 0.9568 11846020
0.3074 0.6262 235 0.9568 12097444
0.2863 0.6395 240 0.9555 12360820
0.3494 0.6528 245 0.9550 12619744
0.3301 0.6662 250 0.9564 12879604
0.2942 0.6795 255 0.9556 13133500
0.2745 0.6928 260 0.9545 13387256
0.2444 0.7061 265 0.9553 13653368
0.2921 0.7195 270 0.9563 13909732
0.256 0.7328 275 0.9558 14169440
0.3005 0.7461 280 0.9538 14430448
0.2816 0.7594 285 0.9529 14687952
0.3103 0.7728 290 0.9544 14938896
0.2936 0.7861 295 0.9562 15191276
0.3045 0.7994 300 0.9557 15445016
0.3128 0.8127 305 0.9543 15698372
0.247 0.8260 310 0.9546 15955200
0.3089 0.8394 315 0.9561 16215480
0.2694 0.8527 320 0.9561 16473036
0.2898 0.8660 325 0.9539 16732368
0.324 0.8793 330 0.9547 16988880
0.3019 0.8927 335 0.9555 17249604
0.3881 0.9060 340 0.9563 17504072
0.1919 0.9193 345 0.9545 17763736
0.2813 0.9326 350 0.9524 18029128
0.3241 0.9460 355 0.9538 18283752
0.2958 0.9593 360 0.9570 18535884
0.3128 0.9726 365 0.9527 18795248
0.287 0.9859 370 0.9508 19049296
0.2516 0.9993 375 0.9531 19303096

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
9.24B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter4_sftsd2

Base model

google/gemma-2-9b
Finetuned
(349)
this model