--- license: gemma base_model: google/gemma-2-27b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd1 results: [] --- # collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd1 This model is a fine-tuned version of [google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.9361 - Num Input Tokens Seen: 21319328 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 4 - eval_batch_size: 16 - seed: 1 - gradient_accumulation_steps: 32 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.1282 | 0 | | 2.9736 | 0.0120 | 5 | 1.0890 | 257960 | | 2.9201 | 0.0240 | 10 | 1.0094 | 513580 | | 2.7063 | 0.0360 | 15 | 0.9961 | 772336 | | 2.7066 | 0.0479 | 20 | 0.9889 | 1027836 | | 2.5663 | 0.0599 | 25 | 0.9868 | 1285388 | | 2.586 | 0.0719 | 30 | 0.9896 | 1541024 | | 2.5497 | 0.0839 | 35 | 0.9909 | 1796588 | | 2.325 | 0.0959 | 40 | 0.9916 | 2051248 | | 2.1303 | 0.1079 | 45 | 0.9928 | 2316512 | | 2.1498 | 0.1198 | 50 | 0.9901 | 2575448 | | 2.1035 | 0.1318 | 55 | 0.9887 | 2827576 | | 2.0106 | 0.1438 | 60 | 0.9895 | 3085924 | | 1.9861 | 0.1558 | 65 | 0.9849 | 3344592 | | 1.8483 | 0.1678 | 70 | 0.9882 | 3587496 | | 1.698 | 0.1798 | 75 | 0.9837 | 3845228 | | 1.5455 | 0.1917 | 80 | 0.9820 | 4094024 | | 1.7371 | 0.2037 | 85 | 0.9779 | 4352288 | | 1.6068 | 0.2157 | 90 | 0.9755 | 4606816 | | 1.6234 | 0.2277 | 95 | 0.9705 | 4865000 | | 1.6119 | 0.2397 | 100 | 0.9710 | 5122860 | | 1.4461 | 0.2517 | 105 | 0.9661 | 5380192 | | 1.5323 | 0.2637 | 110 | 0.9648 | 5638952 | | 1.48 | 0.2756 | 115 | 0.9644 | 5895124 | | 1.5077 | 0.2876 | 120 | 0.9632 | 6150672 | | 1.3105 | 0.2996 | 125 | 0.9605 | 6404592 | | 1.5438 | 0.3116 | 130 | 0.9604 | 6667232 | | 1.6025 | 0.3236 | 135 | 0.9587 | 6919444 | | 1.5647 | 0.3356 | 140 | 0.9575 | 7171560 | | 1.3177 | 0.3475 | 145 | 0.9598 | 7427412 | | 1.4743 | 0.3595 | 150 | 0.9563 | 7690832 | | 1.6544 | 0.3715 | 155 | 0.9547 | 7949984 | | 1.397 | 0.3835 | 160 | 0.9584 | 8205800 | | 1.3666 | 0.3955 | 165 | 0.9543 | 8464028 | | 1.5154 | 0.4075 | 170 | 0.9527 | 8713484 | | 1.5427 | 0.4194 | 175 | 0.9557 | 8971692 | | 1.2568 | 0.4314 | 180 | 0.9521 | 9225284 | | 1.3871 | 0.4434 | 185 | 0.9520 | 9479360 | | 1.5084 | 0.4554 | 190 | 0.9521 | 9730040 | | 1.4411 | 0.4674 | 195 | 0.9499 | 9989888 | | 1.3642 | 0.4794 | 200 | 0.9487 | 10253880 | | 1.2564 | 0.4913 | 205 | 0.9472 | 10506892 | | 1.4515 | 0.5033 | 210 | 0.9496 | 10762052 | | 1.2647 | 0.5153 | 215 | 0.9494 | 11010792 | | 1.3365 | 0.5273 | 220 | 0.9491 | 11258360 | | 1.4796 | 0.5393 | 225 | 0.9486 | 11509984 | | 1.4464 | 0.5513 | 230 | 0.9468 | 11768156 | | 1.1882 | 0.5633 | 235 | 0.9482 | 12022340 | | 1.4812 | 0.5752 | 240 | 0.9485 | 12270644 | | 1.3927 | 0.5872 | 245 | 0.9466 | 12529864 | | 1.5076 | 0.5992 | 250 | 0.9475 | 12788428 | | 1.3727 | 0.6112 | 255 | 0.9459 | 13039508 | | 1.2361 | 0.6232 | 260 | 0.9476 | 13292956 | | 1.3745 | 0.6352 | 265 | 0.9443 | 13548132 | | 1.3198 | 0.6471 | 270 | 0.9442 | 13805636 | | 1.2179 | 0.6591 | 275 | 0.9436 | 14058880 | | 1.4035 | 0.6711 | 280 | 0.9463 | 14318400 | | 1.2952 | 0.6831 | 285 | 0.9440 | 14568908 | | 1.291 | 0.6951 | 290 | 0.9439 | 14823440 | | 1.4132 | 0.7071 | 295 | 0.9436 | 15082248 | | 1.5722 | 0.7190 | 300 | 0.9429 | 15338164 | | 1.2473 | 0.7310 | 305 | 0.9416 | 15601888 | | 1.2805 | 0.7430 | 310 | 0.9420 | 15855996 | | 1.1853 | 0.7550 | 315 | 0.9401 | 16103316 | | 1.4429 | 0.7670 | 320 | 0.9411 | 16354352 | | 1.0744 | 0.7790 | 325 | 0.9417 | 16609264 | | 1.2779 | 0.7910 | 330 | 0.9432 | 16869072 | | 1.4178 | 0.8029 | 335 | 0.9407 | 17125932 | | 1.3986 | 0.8149 | 340 | 0.9414 | 17379164 | | 1.1471 | 0.8269 | 345 | 0.9404 | 17628696 | | 1.1763 | 0.8389 | 350 | 0.9426 | 17884156 | | 1.2251 | 0.8509 | 355 | 0.9389 | 18134160 | | 1.2366 | 0.8629 | 360 | 0.9409 | 18391736 | | 1.3086 | 0.8748 | 365 | 0.9392 | 18644984 | | 1.2506 | 0.8868 | 370 | 0.9405 | 18902772 | | 1.355 | 0.8988 | 375 | 0.9384 | 19165216 | | 1.3424 | 0.9108 | 380 | 0.9400 | 19415060 | | 1.3585 | 0.9228 | 385 | 0.9390 | 19668820 | | 1.3487 | 0.9348 | 390 | 0.9425 | 19922732 | | 1.4113 | 0.9467 | 395 | 0.9402 | 20187160 | | 1.5089 | 0.9587 | 400 | 0.9377 | 20438732 | | 1.3723 | 0.9707 | 405 | 0.9376 | 20699200 | | 1.2797 | 0.9827 | 410 | 0.9422 | 20957600 | | 1.3996 | 0.9947 | 415 | 0.9367 | 21217992 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1