---
license: gemma
base_model: google/gemma-2-27b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-27b_hs2_accumulate_iter4_sftsd1
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-27b_hs2_accumulate_iter4_sftsd1

This model is a fine-tuned version of [google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.9369
- Num Input Tokens Seen: 17420796

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.1282          | 0                 |
| 2.6692        | 0.0144 | 5    | 1.0838          | 251232            |
| 2.6572        | 0.0288 | 10   | 1.0019          | 504724            |
| 2.4487        | 0.0432 | 15   | 0.9898          | 759272            |
| 2.3685        | 0.0576 | 20   | 0.9814          | 1007344           |
| 2.2607        | 0.0721 | 25   | 0.9869          | 1261444           |
| 2.1841        | 0.0865 | 30   | 0.9878          | 1511004           |
| 1.9613        | 0.1009 | 35   | 0.9908          | 1763396           |
| 1.9138        | 0.1153 | 40   | 0.9865          | 2017584           |
| 1.7242        | 0.1297 | 45   | 0.9835          | 2271904           |
| 1.56          | 0.1441 | 50   | 0.9825          | 2528656           |
| 1.5102        | 0.1585 | 55   | 0.9806          | 2772068           |
| 1.4168        | 0.1729 | 60   | 0.9775          | 3023420           |
| 1.4362        | 0.1874 | 65   | 0.9754          | 3276920           |
| 1.3918        | 0.2018 | 70   | 0.9761          | 3531492           |
| 1.5127        | 0.2162 | 75   | 0.9706          | 3784992           |
| 1.3944        | 0.2306 | 80   | 0.9733          | 4032436           |
| 1.1925        | 0.2450 | 85   | 0.9723          | 4273560           |
| 1.183         | 0.2594 | 90   | 0.9640          | 4520508           |
| 1.2304        | 0.2738 | 95   | 0.9646          | 4770368           |
| 1.0872        | 0.2882 | 100  | 0.9648          | 5020016           |
| 1.1574        | 0.3026 | 105  | 0.9607          | 5276716           |
| 1.1035        | 0.3171 | 110  | 0.9611          | 5521372           |
| 1.0914        | 0.3315 | 115  | 0.9585          | 5776324           |
| 0.9998        | 0.3459 | 120  | 0.9598          | 6022272           |
| 0.9534        | 0.3603 | 125  | 0.9555          | 6260392           |
| 1.0917        | 0.3747 | 130  | 0.9535          | 6521380           |
| 1.1094        | 0.3891 | 135  | 0.9535          | 6769228           |
| 1.1871        | 0.4035 | 140  | 0.9526          | 7024704           |
| 0.9796        | 0.4179 | 145  | 0.9514          | 7273240           |
| 1.0659        | 0.4324 | 150  | 0.9495          | 7525180           |
| 1.1488        | 0.4468 | 155  | 0.9484          | 7775292           |
| 0.9887        | 0.4612 | 160  | 0.9497          | 8016808           |
| 1.1045        | 0.4756 | 165  | 0.9451          | 8266100           |
| 1.0371        | 0.4900 | 170  | 0.9465          | 8514128           |
| 1.0966        | 0.5044 | 175  | 0.9450          | 8763440           |
| 1.0408        | 0.5188 | 180  | 0.9460          | 9017676           |
| 1.0891        | 0.5332 | 185  | 0.9435          | 9265972           |
| 1.0561        | 0.5476 | 190  | 0.9450          | 9522024           |
| 0.9537        | 0.5621 | 195  | 0.9434          | 9764580           |
| 0.9373        | 0.5765 | 200  | 0.9431          | 10016796          |
| 1.1323        | 0.5909 | 205  | 0.9423          | 10269756          |
| 1.2019        | 0.6053 | 210  | 0.9438          | 10520656          |
| 0.9699        | 0.6197 | 215  | 0.9416          | 10771848          |
| 0.9654        | 0.6341 | 220  | 0.9426          | 11022436          |
| 0.9461        | 0.6485 | 225  | 0.9405          | 11274272          |
| 0.9865        | 0.6629 | 230  | 0.9414          | 11531652          |
| 0.9315        | 0.6774 | 235  | 0.9391          | 11784148          |
| 0.9826        | 0.6918 | 240  | 0.9406          | 12037420          |
| 0.984         | 0.7062 | 245  | 0.9396          | 12295780          |
| 1.1796        | 0.7206 | 250  | 0.9419          | 12550852          |
| 1.0881        | 0.7350 | 255  | 0.9367          | 12796424          |
| 0.8628        | 0.7494 | 260  | 0.9386          | 13048276          |
| 1.094         | 0.7638 | 265  | 0.9372          | 13302068          |
| 1.0862        | 0.7782 | 270  | 0.9385          | 13552976          |
| 1.0226        | 0.7926 | 275  | 0.9375          | 13805560          |
| 0.9964        | 0.8071 | 280  | 0.9359          | 14063732          |
| 1.0379        | 0.8215 | 285  | 0.9368          | 14323416          |
| 0.7735        | 0.8359 | 290  | 0.9365          | 14578864          |
| 0.8855        | 0.8503 | 295  | 0.9354          | 14831324          |
| 0.9687        | 0.8647 | 300  | 0.9368          | 15079640          |
| 1.0087        | 0.8791 | 305  | 0.9351          | 15336076          |
| 0.8832        | 0.8935 | 310  | 0.9368          | 15598480          |
| 0.9207        | 0.9079 | 315  | 0.9353          | 15852360          |
| 0.9436        | 0.9224 | 320  | 0.9372          | 16105580          |
| 1.0136        | 0.9368 | 325  | 0.9360          | 16360756          |
| 0.9331        | 0.9512 | 330  | 0.9334          | 16610568          |
| 0.8251        | 0.9656 | 335  | 0.9353          | 16866280          |
| 0.8415        | 0.9800 | 340  | 0.9334          | 17114340          |
| 1.0314        | 0.9944 | 345  | 0.9360          | 17367496          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1