RylanSchaeffer's picture
End of training
c6fcb7f verified
metadata
license: gemma
base_model: google/gemma-2-27b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-27b_hs2_replace_iter4_sftsd2
    results: []

collapse_gemma-2-27b_hs2_replace_iter4_sftsd2

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3439
  • Num Input Tokens Seen: 3102268

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.1282 0
4.2255 0.0908 5 1.0942 282080
4.0729 0.1816 10 1.1872 565872
3.8441 0.2724 15 1.2395 844816
3.6873 0.3632 20 1.3028 1127332
3.5368 0.4540 25 1.3094 1411096
3.3857 0.5448 30 1.3313 1691864
3.2977 0.6356 35 1.3384 1975384
3.3165 0.7264 40 1.3339 2258476
3.1495 0.8173 45 1.3452 2540700
3.1388 0.9081 50 1.3515 2820672
3.0977 0.9989 55 1.3439 3102268

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1