--- license: gemma base_model: google/gemma-2-27b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-27b_hs2_accumulate_iter4_sftsd1 results: [] --- # collapse_gemma-2-27b_hs2_accumulate_iter4_sftsd1 This model is a fine-tuned version of [google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.9369 - Num Input Tokens Seen: 17420796 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 4 - eval_batch_size: 16 - seed: 1 - gradient_accumulation_steps: 32 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.1282 | 0 | | 2.6692 | 0.0144 | 5 | 1.0838 | 251232 | | 2.6572 | 0.0288 | 10 | 1.0019 | 504724 | | 2.4487 | 0.0432 | 15 | 0.9898 | 759272 | | 2.3685 | 0.0576 | 20 | 0.9814 | 1007344 | | 2.2607 | 0.0721 | 25 | 0.9869 | 1261444 | | 2.1841 | 0.0865 | 30 | 0.9878 | 1511004 | | 1.9613 | 0.1009 | 35 | 0.9908 | 1763396 | | 1.9138 | 0.1153 | 40 | 0.9865 | 2017584 | | 1.7242 | 0.1297 | 45 | 0.9835 | 2271904 | | 1.56 | 0.1441 | 50 | 0.9825 | 2528656 | | 1.5102 | 0.1585 | 55 | 0.9806 | 2772068 | | 1.4168 | 0.1729 | 60 | 0.9775 | 3023420 | | 1.4362 | 0.1874 | 65 | 0.9754 | 3276920 | | 1.3918 | 0.2018 | 70 | 0.9761 | 3531492 | | 1.5127 | 0.2162 | 75 | 0.9706 | 3784992 | | 1.3944 | 0.2306 | 80 | 0.9733 | 4032436 | | 1.1925 | 0.2450 | 85 | 0.9723 | 4273560 | | 1.183 | 0.2594 | 90 | 0.9640 | 4520508 | | 1.2304 | 0.2738 | 95 | 0.9646 | 4770368 | | 1.0872 | 0.2882 | 100 | 0.9648 | 5020016 | | 1.1574 | 0.3026 | 105 | 0.9607 | 5276716 | | 1.1035 | 0.3171 | 110 | 0.9611 | 5521372 | | 1.0914 | 0.3315 | 115 | 0.9585 | 5776324 | | 0.9998 | 0.3459 | 120 | 0.9598 | 6022272 | | 0.9534 | 0.3603 | 125 | 0.9555 | 6260392 | | 1.0917 | 0.3747 | 130 | 0.9535 | 6521380 | | 1.1094 | 0.3891 | 135 | 0.9535 | 6769228 | | 1.1871 | 0.4035 | 140 | 0.9526 | 7024704 | | 0.9796 | 0.4179 | 145 | 0.9514 | 7273240 | | 1.0659 | 0.4324 | 150 | 0.9495 | 7525180 | | 1.1488 | 0.4468 | 155 | 0.9484 | 7775292 | | 0.9887 | 0.4612 | 160 | 0.9497 | 8016808 | | 1.1045 | 0.4756 | 165 | 0.9451 | 8266100 | | 1.0371 | 0.4900 | 170 | 0.9465 | 8514128 | | 1.0966 | 0.5044 | 175 | 0.9450 | 8763440 | | 1.0408 | 0.5188 | 180 | 0.9460 | 9017676 | | 1.0891 | 0.5332 | 185 | 0.9435 | 9265972 | | 1.0561 | 0.5476 | 190 | 0.9450 | 9522024 | | 0.9537 | 0.5621 | 195 | 0.9434 | 9764580 | | 0.9373 | 0.5765 | 200 | 0.9431 | 10016796 | | 1.1323 | 0.5909 | 205 | 0.9423 | 10269756 | | 1.2019 | 0.6053 | 210 | 0.9438 | 10520656 | | 0.9699 | 0.6197 | 215 | 0.9416 | 10771848 | | 0.9654 | 0.6341 | 220 | 0.9426 | 11022436 | | 0.9461 | 0.6485 | 225 | 0.9405 | 11274272 | | 0.9865 | 0.6629 | 230 | 0.9414 | 11531652 | | 0.9315 | 0.6774 | 235 | 0.9391 | 11784148 | | 0.9826 | 0.6918 | 240 | 0.9406 | 12037420 | | 0.984 | 0.7062 | 245 | 0.9396 | 12295780 | | 1.1796 | 0.7206 | 250 | 0.9419 | 12550852 | | 1.0881 | 0.7350 | 255 | 0.9367 | 12796424 | | 0.8628 | 0.7494 | 260 | 0.9386 | 13048276 | | 1.094 | 0.7638 | 265 | 0.9372 | 13302068 | | 1.0862 | 0.7782 | 270 | 0.9385 | 13552976 | | 1.0226 | 0.7926 | 275 | 0.9375 | 13805560 | | 0.9964 | 0.8071 | 280 | 0.9359 | 14063732 | | 1.0379 | 0.8215 | 285 | 0.9368 | 14323416 | | 0.7735 | 0.8359 | 290 | 0.9365 | 14578864 | | 0.8855 | 0.8503 | 295 | 0.9354 | 14831324 | | 0.9687 | 0.8647 | 300 | 0.9368 | 15079640 | | 1.0087 | 0.8791 | 305 | 0.9351 | 15336076 | | 0.8832 | 0.8935 | 310 | 0.9368 | 15598480 | | 0.9207 | 0.9079 | 315 | 0.9353 | 15852360 | | 0.9436 | 0.9224 | 320 | 0.9372 | 16105580 | | 1.0136 | 0.9368 | 325 | 0.9360 | 16360756 | | 0.9331 | 0.9512 | 330 | 0.9334 | 16610568 | | 0.8251 | 0.9656 | 335 | 0.9353 | 16866280 | | 0.8415 | 0.9800 | 340 | 0.9334 | 17114340 | | 1.0314 | 0.9944 | 345 | 0.9360 | 17367496 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1