Edit model card

collapse_gemma-2-2b_hs2_replace_iter2_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.4692
  • Num Input Tokens Seen: 8037456

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.6452 0.0346 5 1.3058 278656
1.3928 0.0692 10 1.2047 558136
1.3648 0.1038 15 1.1749 828560
1.1606 0.1383 20 1.1904 1108192
0.9722 0.1729 25 1.2270 1388824
0.8185 0.2075 30 1.3096 1665952
0.6731 0.2421 35 1.3850 1950056
0.5466 0.2767 40 1.4228 2228112
0.4882 0.3113 45 1.5026 2510296
0.4421 0.3459 50 1.4879 2795248
0.3396 0.3805 55 1.4673 3079696
0.2269 0.4150 60 1.5111 3363824
0.2738 0.4496 65 1.4618 3641232
0.3523 0.4842 70 1.4619 3912784
0.2859 0.5188 75 1.4459 4191336
0.1768 0.5534 80 1.4447 4471320
0.1786 0.5880 85 1.4194 4751488
0.1399 0.6226 90 1.4671 5027576
0.1653 0.6572 95 1.4218 5303728
0.1802 0.6917 100 1.4062 5582944
0.1076 0.7263 105 1.4140 5859696
0.16 0.7609 110 1.4109 6130248
0.0994 0.7955 115 1.4027 6409520
0.0925 0.8301 120 1.4232 6691040
0.114 0.8647 125 1.4581 6969344
0.1758 0.8993 130 1.4161 7248104
0.0975 0.9339 135 1.4421 7526360
0.1121 0.9684 140 1.4560 7808664

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/collapse_gemma-2-2b_hs2_replace_iter2_sftsd1

Base model

google/gemma-2-2b
Finetuned
(423)
this model