Edit model card

collapse_gemma-2-2b_hs2_accumulatesubsample_iter20_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2287
  • Num Input Tokens Seen: 4938184

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.3396 0.0533 5 1.2821 269216
1.0707 0.1067 10 1.2356 537456
0.9303 0.16 15 1.2354 795928
0.8553 0.2133 20 1.2527 1054264
0.8268 0.2667 25 1.2553 1324584
0.7279 0.32 30 1.2700 1597000
0.5158 0.3733 35 1.2862 1858032
0.5511 0.4267 40 1.2565 2122448
0.5151 0.48 45 1.2456 2386632
0.5688 0.5333 50 1.2360 2651920
0.408 0.5867 55 1.2481 2923680
0.4403 0.64 60 1.2211 3186272
0.3863 0.6933 65 1.2360 3456024
0.4065 0.7467 70 1.2128 3727192
0.4249 0.8 75 1.2300 3989832
0.4252 0.8533 80 1.2140 4250048
0.3838 0.9067 85 1.2314 4509488
0.4182 0.96 90 1.2114 4776720

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter20_sftsd0

Base model

google/gemma-2-2b
Finetuned
(429)
this model