Edit model card

collapse_gemma-2-2b_hs2_accumulatesubsample_iter20_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2131
  • Num Input Tokens Seen: 4981168

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.3101 0.0528 5 1.2785 259360
1.0508 0.1057 10 1.2293 530168
0.9566 0.1585 15 1.2129 794520
0.8309 0.2114 20 1.2625 1058200
0.8029 0.2642 25 1.2461 1327328
0.6388 0.3170 30 1.2820 1587968
0.5711 0.3699 35 1.2793 1853352
0.5408 0.4227 40 1.2597 2118856
0.5223 0.4756 45 1.2438 2384304
0.4692 0.5284 50 1.2533 2651296
0.5336 0.5812 55 1.2343 2907944
0.4685 0.6341 60 1.2426 3175608
0.4822 0.6869 65 1.2253 3449616
0.4543 0.7398 70 1.2388 3719976
0.4193 0.7926 75 1.2284 3987200
0.4007 0.8454 80 1.2234 4247992
0.3711 0.8983 85 1.2196 4508112
0.4195 0.9511 90 1.2262 4769520

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter20_sftsd2

Base model

google/gemma-2-2b
Finetuned
(429)
this model