collapse_gemma-2-27b_hs2_replace_iter4_sftsd2

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3439
  • Num Input Tokens Seen: 3102268

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.1282 0
4.2255 0.0908 5 1.0942 282080
4.0729 0.1816 10 1.1872 565872
3.8441 0.2724 15 1.2395 844816
3.6873 0.3632 20 1.3028 1127332
3.5368 0.4540 25 1.3094 1411096
3.3857 0.5448 30 1.3313 1691864
3.2977 0.6356 35 1.3384 1975384
3.3165 0.7264 40 1.3339 2258476
3.1495 0.8173 45 1.3452 2540700
3.1388 0.9081 50 1.3515 2820672
3.0977 0.9989 55 1.3439 3102268

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
27.2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_replace_iter4_sftsd2

Base model

google/gemma-2-27b
Finetuned
(52)
this model