Baby-Llama-58M-RUN3_3

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.8148

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.00025
  • train_batch_size: 32
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • num_epochs: 120
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
297.4542 1.0 12 250.9910
229.6338 2.0 24 208.3821
208.295 3.0 36 179.5238
129.018 4.0 48 112.9940
82.9929 5.0 60 74.3020
46.9522 6.0 72 42.2297
24.9202 7.0 84 23.4095
15.2942 8.0 96 13.3510
10.0619 9.0 108 9.7284
7.784 10.0 120 7.8737
6.4759 11.0 132 7.2488
6.1744 12.0 144 6.3695
5.4904 13.0 156 6.2293
5.4665 14.0 168 5.8846
4.731 15.0 180 5.8094
4.7619 16.0 192 5.4680
4.6858 17.0 204 5.4562
4.594 18.0 216 5.2367
4.7173 19.0 228 5.1584
4.2267 20.0 240 5.1182
4.2401 21.0 252 5.0173
4.767 22.0 264 4.9806
4.0932 23.0 276 4.8975
4.3266 24.0 288 4.8852
4.0103 25.0 300 4.7698
4.1829 26.0 312 4.7993
4.0862 27.0 324 4.7921
4.1418 28.0 336 4.7469
4.0668 29.0 348 4.7108
4.0318 30.0 360 4.6335
4.0468 31.0 372 4.6761
3.9454 32.0 384 4.5814
3.943 33.0 396 4.5624
3.5406 34.0 408 4.6243
3.5091 35.0 420 4.5822
3.5972 36.0 432 4.4551
3.711 37.0 444 4.4898
3.7391 38.0 456 4.4472
3.7883 39.0 468 4.4188
3.7508 40.0 480 4.3803
3.422 41.0 492 4.3539
3.5801 42.0 504 4.3718
3.3411 43.0 516 4.3635
3.5347 44.0 528 4.3381
3.3136 45.0 540 4.2857
3.6378 46.0 552 4.2428
3.9194 47.0 564 4.3143
3.444 48.0 576 4.2403
3.5414 49.0 588 4.2614
3.6703 50.0 600 4.2729
3.5997 51.0 612 4.2104
3.1202 52.0 624 4.1948
3.3409 53.0 636 4.2018
3.4611 54.0 648 4.1726
3.1643 55.0 660 4.1776
3.1082 56.0 672 4.1785
2.9745 57.0 684 4.1374
3.3937 58.0 696 4.1434
3.265 59.0 708 4.1356
3.0267 60.0 720 4.1474
3.0632 61.0 732 4.1193
3.3543 62.0 744 4.0760
3.519 63.0 756 4.1373
3.2546 64.0 768 4.0591
3.0835 65.0 780 4.0572
3.3228 66.0 792 4.0788
3.3441 67.0 804 4.0489
2.9186 68.0 816 4.0360
3.1519 69.0 828 4.0376
3.5119 70.0 840 4.0159
3.1155 71.0 852 4.0070
3.1899 72.0 864 3.9895
3.0979 73.0 876 3.9936
3.1709 74.0 888 3.9997
3.3529 75.0 900 3.9848
2.7989 76.0 912 3.9760
3.1918 77.0 924 3.9693
2.8472 78.0 936 3.9504
3.3493 79.0 948 3.9520
3.5098 80.0 960 3.9401
3.2381 81.0 972 3.9363
3.1959 82.0 984 3.9292
3.4514 83.0 996 3.9128
2.9119 84.0 1008 3.9194
3.2452 85.0 1020 3.9038
3.0657 86.0 1032 3.9168
2.8583 87.0 1044 3.9018
3.2229 88.0 1056 3.9000
2.9973 89.0 1068 3.8906
3.0533 90.0 1080 3.8818
3.3813 91.0 1092 3.8715
3.1559 92.0 1104 3.8639
3.1343 93.0 1116 3.8674
2.9604 94.0 1128 3.8690
3.3522 95.0 1140 3.8646
2.9739 96.0 1152 3.8589
2.7854 97.0 1164 3.8559
2.8544 98.0 1176 3.8445
2.9875 99.0 1188 3.8434
3.3395 100.0 1200 3.8402
2.736 101.0 1212 3.8398
3.0598 102.0 1224 3.8384
3.003 103.0 1236 3.8376
3.0566 104.0 1248 3.8386
3.1727 105.0 1260 3.8281
2.9811 106.0 1272 3.8331
2.7108 107.0 1284 3.8224
2.6579 108.0 1296 3.8236
3.1319 109.0 1308 3.8197
3.1115 110.0 1320 3.8216
3.0955 111.0 1332 3.8181
2.6928 112.0 1344 3.8188
2.9943 113.0 1356 3.8147
3.0923 114.0 1368 3.8154
3.1913 115.0 1380 3.8156
2.9444 116.0 1392 3.8146
3.0491 117.0 1404 3.8141
2.7357 118.0 1416 3.8148
3.0744 119.0 1428 3.8148
3.1122 120.0 1440 3.8148

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.0
Downloads last month
127
Safetensors
Model size
46.5M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.