---
license: gemma
base_model: google/gemma-2-27b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd1
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd1

This model is a fine-tuned version of [google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.9361
- Num Input Tokens Seen: 21319328

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.1282          | 0                 |
| 2.9736        | 0.0120 | 5    | 1.0890          | 257960            |
| 2.9201        | 0.0240 | 10   | 1.0094          | 513580            |
| 2.7063        | 0.0360 | 15   | 0.9961          | 772336            |
| 2.7066        | 0.0479 | 20   | 0.9889          | 1027836           |
| 2.5663        | 0.0599 | 25   | 0.9868          | 1285388           |
| 2.586         | 0.0719 | 30   | 0.9896          | 1541024           |
| 2.5497        | 0.0839 | 35   | 0.9909          | 1796588           |
| 2.325         | 0.0959 | 40   | 0.9916          | 2051248           |
| 2.1303        | 0.1079 | 45   | 0.9928          | 2316512           |
| 2.1498        | 0.1198 | 50   | 0.9901          | 2575448           |
| 2.1035        | 0.1318 | 55   | 0.9887          | 2827576           |
| 2.0106        | 0.1438 | 60   | 0.9895          | 3085924           |
| 1.9861        | 0.1558 | 65   | 0.9849          | 3344592           |
| 1.8483        | 0.1678 | 70   | 0.9882          | 3587496           |
| 1.698         | 0.1798 | 75   | 0.9837          | 3845228           |
| 1.5455        | 0.1917 | 80   | 0.9820          | 4094024           |
| 1.7371        | 0.2037 | 85   | 0.9779          | 4352288           |
| 1.6068        | 0.2157 | 90   | 0.9755          | 4606816           |
| 1.6234        | 0.2277 | 95   | 0.9705          | 4865000           |
| 1.6119        | 0.2397 | 100  | 0.9710          | 5122860           |
| 1.4461        | 0.2517 | 105  | 0.9661          | 5380192           |
| 1.5323        | 0.2637 | 110  | 0.9648          | 5638952           |
| 1.48          | 0.2756 | 115  | 0.9644          | 5895124           |
| 1.5077        | 0.2876 | 120  | 0.9632          | 6150672           |
| 1.3105        | 0.2996 | 125  | 0.9605          | 6404592           |
| 1.5438        | 0.3116 | 130  | 0.9604          | 6667232           |
| 1.6025        | 0.3236 | 135  | 0.9587          | 6919444           |
| 1.5647        | 0.3356 | 140  | 0.9575          | 7171560           |
| 1.3177        | 0.3475 | 145  | 0.9598          | 7427412           |
| 1.4743        | 0.3595 | 150  | 0.9563          | 7690832           |
| 1.6544        | 0.3715 | 155  | 0.9547          | 7949984           |
| 1.397         | 0.3835 | 160  | 0.9584          | 8205800           |
| 1.3666        | 0.3955 | 165  | 0.9543          | 8464028           |
| 1.5154        | 0.4075 | 170  | 0.9527          | 8713484           |
| 1.5427        | 0.4194 | 175  | 0.9557          | 8971692           |
| 1.2568        | 0.4314 | 180  | 0.9521          | 9225284           |
| 1.3871        | 0.4434 | 185  | 0.9520          | 9479360           |
| 1.5084        | 0.4554 | 190  | 0.9521          | 9730040           |
| 1.4411        | 0.4674 | 195  | 0.9499          | 9989888           |
| 1.3642        | 0.4794 | 200  | 0.9487          | 10253880          |
| 1.2564        | 0.4913 | 205  | 0.9472          | 10506892          |
| 1.4515        | 0.5033 | 210  | 0.9496          | 10762052          |
| 1.2647        | 0.5153 | 215  | 0.9494          | 11010792          |
| 1.3365        | 0.5273 | 220  | 0.9491          | 11258360          |
| 1.4796        | 0.5393 | 225  | 0.9486          | 11509984          |
| 1.4464        | 0.5513 | 230  | 0.9468          | 11768156          |
| 1.1882        | 0.5633 | 235  | 0.9482          | 12022340          |
| 1.4812        | 0.5752 | 240  | 0.9485          | 12270644          |
| 1.3927        | 0.5872 | 245  | 0.9466          | 12529864          |
| 1.5076        | 0.5992 | 250  | 0.9475          | 12788428          |
| 1.3727        | 0.6112 | 255  | 0.9459          | 13039508          |
| 1.2361        | 0.6232 | 260  | 0.9476          | 13292956          |
| 1.3745        | 0.6352 | 265  | 0.9443          | 13548132          |
| 1.3198        | 0.6471 | 270  | 0.9442          | 13805636          |
| 1.2179        | 0.6591 | 275  | 0.9436          | 14058880          |
| 1.4035        | 0.6711 | 280  | 0.9463          | 14318400          |
| 1.2952        | 0.6831 | 285  | 0.9440          | 14568908          |
| 1.291         | 0.6951 | 290  | 0.9439          | 14823440          |
| 1.4132        | 0.7071 | 295  | 0.9436          | 15082248          |
| 1.5722        | 0.7190 | 300  | 0.9429          | 15338164          |
| 1.2473        | 0.7310 | 305  | 0.9416          | 15601888          |
| 1.2805        | 0.7430 | 310  | 0.9420          | 15855996          |
| 1.1853        | 0.7550 | 315  | 0.9401          | 16103316          |
| 1.4429        | 0.7670 | 320  | 0.9411          | 16354352          |
| 1.0744        | 0.7790 | 325  | 0.9417          | 16609264          |
| 1.2779        | 0.7910 | 330  | 0.9432          | 16869072          |
| 1.4178        | 0.8029 | 335  | 0.9407          | 17125932          |
| 1.3986        | 0.8149 | 340  | 0.9414          | 17379164          |
| 1.1471        | 0.8269 | 345  | 0.9404          | 17628696          |
| 1.1763        | 0.8389 | 350  | 0.9426          | 17884156          |
| 1.2251        | 0.8509 | 355  | 0.9389          | 18134160          |
| 1.2366        | 0.8629 | 360  | 0.9409          | 18391736          |
| 1.3086        | 0.8748 | 365  | 0.9392          | 18644984          |
| 1.2506        | 0.8868 | 370  | 0.9405          | 18902772          |
| 1.355         | 0.8988 | 375  | 0.9384          | 19165216          |
| 1.3424        | 0.9108 | 380  | 0.9400          | 19415060          |
| 1.3585        | 0.9228 | 385  | 0.9390          | 19668820          |
| 1.3487        | 0.9348 | 390  | 0.9425          | 19922732          |
| 1.4113        | 0.9467 | 395  | 0.9402          | 20187160          |
| 1.5089        | 0.9587 | 400  | 0.9377          | 20438732          |
| 1.3723        | 0.9707 | 405  | 0.9376          | 20699200          |
| 1.2797        | 0.9827 | 410  | 0.9422          | 20957600          |
| 1.3996        | 0.9947 | 415  | 0.9367          | 21217992          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1