collapse_gemma-2-9b_hs2_replace_iter5_sftsd1

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.6182
  • Num Input Tokens Seen: 4642096

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.2335 0
1.1356 0.0513 5 1.0985 236640
0.5158 0.1027 10 1.1915 478096
0.1867 0.1540 15 1.3803 713224
0.0692 0.2053 20 1.4815 950388
0.0275 0.2567 25 1.3822 1195036
0.0254 0.3080 30 1.4820 1434668
0.0248 0.3593 35 1.5498 1669304
0.0226 0.4107 40 1.5781 1908432
0.0247 0.4620 45 1.5221 2150908
0.0224 0.5133 50 1.4731 2397208
0.0289 0.5646 55 1.4650 2634352
0.0217 0.6160 60 1.4817 2867608
0.0255 0.6673 65 1.5039 3114572
0.0207 0.7186 70 1.5013 3357172
0.0214 0.7700 75 1.4934 3593844
0.0231 0.8213 80 1.5160 3833908
0.0205 0.8726 85 1.5363 4071676
0.0219 0.9240 90 1.5761 4314868
0.0244 0.9753 95 1.6040 4546468

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
9.24B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_replace_iter5_sftsd1

Base model

google/gemma-2-9b
Finetuned
(349)
this model