jkazdan's picture
End of training
46dbf0d verified
|
raw
history blame
7.18 kB
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd2
    results: []

collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1032
  • Num Input Tokens Seen: 21819352

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.5432 0.0127 5 1.3798 276728
1.4208 0.0254 10 1.2917 554256
1.4236 0.0381 15 1.2111 833616
1.3033 0.0508 20 1.1647 1109744
1.2167 0.0634 25 1.1518 1384696
1.0953 0.0761 30 1.1341 1664008
0.9168 0.0888 35 1.1461 1944176
0.9273 0.1015 40 1.1542 2218368
0.8943 0.1142 45 1.1696 2492552
0.8168 0.1269 50 1.1792 2773488
0.7781 0.1396 55 1.1739 3050208
0.8131 0.1523 60 1.1845 3326584
0.6973 0.1649 65 1.1836 3606104
0.7054 0.1776 70 1.1733 3887952
0.685 0.1903 75 1.1764 4170752
0.5768 0.2030 80 1.1771 4444816
0.6494 0.2157 85 1.1719 4718552
0.5484 0.2284 90 1.1698 4998784
0.5609 0.2411 95 1.1739 5274536
0.4343 0.2538 100 1.1755 5553760
0.5656 0.2665 105 1.1654 5828328
0.5633 0.2791 110 1.1696 6104712
0.4485 0.2918 115 1.1631 6380840
0.4853 0.3045 120 1.1658 6651752
0.4552 0.3172 125 1.1593 6928872
0.4465 0.3299 130 1.1584 7200200
0.4402 0.3426 135 1.1605 7481976
0.4228 0.3553 140 1.1536 7765000
0.5075 0.3680 145 1.1529 8037040
0.3783 0.3807 150 1.1505 8313288
0.4 0.3933 155 1.1464 8593584
0.4482 0.4060 160 1.1507 8869384
0.4995 0.4187 165 1.1418 9145296
0.4386 0.4314 170 1.1420 9423816
0.3944 0.4441 175 1.1406 9707024
0.5069 0.4568 180 1.1408 9977424
0.36 0.4695 185 1.1408 10247568
0.4558 0.4822 190 1.1369 10525312
0.4699 0.4948 195 1.1341 10807080
0.5118 0.5075 200 1.1346 11075200
0.5246 0.5202 205 1.1310 11355128
0.5085 0.5329 210 1.1323 11635976
0.3497 0.5456 215 1.1290 11912608
0.4282 0.5583 220 1.1304 12191360
0.3405 0.5710 225 1.1261 12468896
0.4814 0.5837 230 1.1271 12748408
0.3857 0.5964 235 1.1262 13023016
0.4579 0.6090 240 1.1245 13302328
0.4054 0.6217 245 1.1244 13575408
0.4019 0.6344 250 1.1222 13851880
0.4085 0.6471 255 1.1206 14126456
0.3261 0.6598 260 1.1226 14411880
0.3434 0.6725 265 1.1197 14693704
0.3898 0.6852 270 1.1189 14972552
0.3275 0.6979 275 1.1202 15244856
0.3851 0.7105 280 1.1181 15517984
0.3896 0.7232 285 1.1167 15793480
0.4382 0.7359 290 1.1164 16072136
0.4112 0.7486 295 1.1147 16347632
0.4165 0.7613 300 1.1153 16622200
0.3549 0.7740 305 1.1137 16896656
0.3859 0.7867 310 1.1130 17175712
0.3636 0.7994 315 1.1129 17456320
0.4647 0.8121 320 1.1109 17735952
0.3973 0.8247 325 1.1121 18011048
0.3857 0.8374 330 1.1100 18285984
0.3692 0.8501 335 1.1105 18560024
0.4178 0.8628 340 1.1092 18834584
0.3232 0.8755 345 1.1070 19113832
0.3482 0.8882 350 1.1070 19390200
0.4256 0.9009 355 1.1065 19670664
0.4421 0.9136 360 1.1040 19946664
0.4513 0.9262 365 1.1046 20229584
0.395 0.9389 370 1.1059 20503736
0.3129 0.9516 375 1.1033 20776680
0.3915 0.9643 380 1.1048 21053616
0.3239 0.9770 385 1.1003 21327312
0.3765 0.9897 390 1.1039 21601936

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1