collapse_gemma-2-9b_hs2_accumulate_iter4_sftsd2
This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.9531
- Num Input Tokens Seen: 19303096
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.2335 | 0 |
1.4118 | 0.0133 | 5 | 1.1826 | 257860 |
1.2214 | 0.0266 | 10 | 1.0631 | 522104 |
1.0564 | 0.0400 | 15 | 1.0225 | 780584 |
0.7497 | 0.0533 | 20 | 1.0114 | 1036328 |
0.6525 | 0.0666 | 25 | 1.0250 | 1295960 |
0.4833 | 0.0799 | 30 | 1.0213 | 1552720 |
0.4335 | 0.0933 | 35 | 1.0174 | 1808712 |
0.3836 | 0.1066 | 40 | 1.0126 | 2061276 |
0.4176 | 0.1199 | 45 | 1.0100 | 2316160 |
0.3447 | 0.1332 | 50 | 1.0023 | 2572684 |
0.3402 | 0.1466 | 55 | 0.9984 | 2828056 |
0.3671 | 0.1599 | 60 | 0.9914 | 3084180 |
0.3605 | 0.1732 | 65 | 0.9913 | 3341524 |
0.3938 | 0.1865 | 70 | 0.9866 | 3608220 |
0.3298 | 0.1999 | 75 | 0.9840 | 3864084 |
0.3437 | 0.2132 | 80 | 0.9800 | 4125920 |
0.4241 | 0.2265 | 85 | 0.9796 | 4376200 |
0.3798 | 0.2398 | 90 | 0.9779 | 4636060 |
0.3598 | 0.2531 | 95 | 0.9747 | 4894472 |
0.401 | 0.2665 | 100 | 0.9730 | 5157976 |
0.3151 | 0.2798 | 105 | 0.9742 | 5414044 |
0.3781 | 0.2931 | 110 | 0.9713 | 5673020 |
0.4242 | 0.3064 | 115 | 0.9694 | 5930676 |
0.3515 | 0.3198 | 120 | 0.9692 | 6195360 |
0.2744 | 0.3331 | 125 | 0.9673 | 6452160 |
0.3215 | 0.3464 | 130 | 0.9655 | 6702024 |
0.3921 | 0.3597 | 135 | 0.9647 | 6952796 |
0.3987 | 0.3731 | 140 | 0.9633 | 7216020 |
0.3074 | 0.3864 | 145 | 0.9640 | 7474692 |
0.3314 | 0.3997 | 150 | 0.9631 | 7739548 |
0.3048 | 0.4130 | 155 | 0.9610 | 8005920 |
0.3229 | 0.4263 | 160 | 0.9626 | 8259444 |
0.2944 | 0.4397 | 165 | 0.9617 | 8514840 |
0.2932 | 0.4530 | 170 | 0.9619 | 8772880 |
0.2929 | 0.4663 | 175 | 0.9613 | 9032612 |
0.3491 | 0.4796 | 180 | 0.9602 | 9285936 |
0.3658 | 0.4930 | 185 | 0.9611 | 9541684 |
0.2627 | 0.5063 | 190 | 0.9609 | 9796096 |
0.3652 | 0.5196 | 195 | 0.9597 | 10054920 |
0.2474 | 0.5329 | 200 | 0.9593 | 10310456 |
0.3399 | 0.5463 | 205 | 0.9610 | 10566224 |
0.293 | 0.5596 | 210 | 0.9584 | 10821340 |
0.332 | 0.5729 | 215 | 0.9575 | 11080028 |
0.3365 | 0.5862 | 220 | 0.9576 | 11339624 |
0.3079 | 0.5996 | 225 | 0.9569 | 11596368 |
0.3383 | 0.6129 | 230 | 0.9568 | 11846020 |
0.3074 | 0.6262 | 235 | 0.9568 | 12097444 |
0.2863 | 0.6395 | 240 | 0.9555 | 12360820 |
0.3494 | 0.6528 | 245 | 0.9550 | 12619744 |
0.3301 | 0.6662 | 250 | 0.9564 | 12879604 |
0.2942 | 0.6795 | 255 | 0.9556 | 13133500 |
0.2745 | 0.6928 | 260 | 0.9545 | 13387256 |
0.2444 | 0.7061 | 265 | 0.9553 | 13653368 |
0.2921 | 0.7195 | 270 | 0.9563 | 13909732 |
0.256 | 0.7328 | 275 | 0.9558 | 14169440 |
0.3005 | 0.7461 | 280 | 0.9538 | 14430448 |
0.2816 | 0.7594 | 285 | 0.9529 | 14687952 |
0.3103 | 0.7728 | 290 | 0.9544 | 14938896 |
0.2936 | 0.7861 | 295 | 0.9562 | 15191276 |
0.3045 | 0.7994 | 300 | 0.9557 | 15445016 |
0.3128 | 0.8127 | 305 | 0.9543 | 15698372 |
0.247 | 0.8260 | 310 | 0.9546 | 15955200 |
0.3089 | 0.8394 | 315 | 0.9561 | 16215480 |
0.2694 | 0.8527 | 320 | 0.9561 | 16473036 |
0.2898 | 0.8660 | 325 | 0.9539 | 16732368 |
0.324 | 0.8793 | 330 | 0.9547 | 16988880 |
0.3019 | 0.8927 | 335 | 0.9555 | 17249604 |
0.3881 | 0.9060 | 340 | 0.9563 | 17504072 |
0.1919 | 0.9193 | 345 | 0.9545 | 17763736 |
0.2813 | 0.9326 | 350 | 0.9524 | 18029128 |
0.3241 | 0.9460 | 355 | 0.9538 | 18283752 |
0.2958 | 0.9593 | 360 | 0.9570 | 18535884 |
0.3128 | 0.9726 | 365 | 0.9527 | 18795248 |
0.287 | 0.9859 | 370 | 0.9508 | 19049296 |
0.2516 | 0.9993 | 375 | 0.9531 | 19303096 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter4_sftsd2
Base model
google/gemma-2-9b