Baby-Llama-58M-RUN3_3
This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 3.8148
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.00025
- train_batch_size: 32
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 50
- num_epochs: 120
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
297.4542 | 1.0 | 12 | 250.9910 |
229.6338 | 2.0 | 24 | 208.3821 |
208.295 | 3.0 | 36 | 179.5238 |
129.018 | 4.0 | 48 | 112.9940 |
82.9929 | 5.0 | 60 | 74.3020 |
46.9522 | 6.0 | 72 | 42.2297 |
24.9202 | 7.0 | 84 | 23.4095 |
15.2942 | 8.0 | 96 | 13.3510 |
10.0619 | 9.0 | 108 | 9.7284 |
7.784 | 10.0 | 120 | 7.8737 |
6.4759 | 11.0 | 132 | 7.2488 |
6.1744 | 12.0 | 144 | 6.3695 |
5.4904 | 13.0 | 156 | 6.2293 |
5.4665 | 14.0 | 168 | 5.8846 |
4.731 | 15.0 | 180 | 5.8094 |
4.7619 | 16.0 | 192 | 5.4680 |
4.6858 | 17.0 | 204 | 5.4562 |
4.594 | 18.0 | 216 | 5.2367 |
4.7173 | 19.0 | 228 | 5.1584 |
4.2267 | 20.0 | 240 | 5.1182 |
4.2401 | 21.0 | 252 | 5.0173 |
4.767 | 22.0 | 264 | 4.9806 |
4.0932 | 23.0 | 276 | 4.8975 |
4.3266 | 24.0 | 288 | 4.8852 |
4.0103 | 25.0 | 300 | 4.7698 |
4.1829 | 26.0 | 312 | 4.7993 |
4.0862 | 27.0 | 324 | 4.7921 |
4.1418 | 28.0 | 336 | 4.7469 |
4.0668 | 29.0 | 348 | 4.7108 |
4.0318 | 30.0 | 360 | 4.6335 |
4.0468 | 31.0 | 372 | 4.6761 |
3.9454 | 32.0 | 384 | 4.5814 |
3.943 | 33.0 | 396 | 4.5624 |
3.5406 | 34.0 | 408 | 4.6243 |
3.5091 | 35.0 | 420 | 4.5822 |
3.5972 | 36.0 | 432 | 4.4551 |
3.711 | 37.0 | 444 | 4.4898 |
3.7391 | 38.0 | 456 | 4.4472 |
3.7883 | 39.0 | 468 | 4.4188 |
3.7508 | 40.0 | 480 | 4.3803 |
3.422 | 41.0 | 492 | 4.3539 |
3.5801 | 42.0 | 504 | 4.3718 |
3.3411 | 43.0 | 516 | 4.3635 |
3.5347 | 44.0 | 528 | 4.3381 |
3.3136 | 45.0 | 540 | 4.2857 |
3.6378 | 46.0 | 552 | 4.2428 |
3.9194 | 47.0 | 564 | 4.3143 |
3.444 | 48.0 | 576 | 4.2403 |
3.5414 | 49.0 | 588 | 4.2614 |
3.6703 | 50.0 | 600 | 4.2729 |
3.5997 | 51.0 | 612 | 4.2104 |
3.1202 | 52.0 | 624 | 4.1948 |
3.3409 | 53.0 | 636 | 4.2018 |
3.4611 | 54.0 | 648 | 4.1726 |
3.1643 | 55.0 | 660 | 4.1776 |
3.1082 | 56.0 | 672 | 4.1785 |
2.9745 | 57.0 | 684 | 4.1374 |
3.3937 | 58.0 | 696 | 4.1434 |
3.265 | 59.0 | 708 | 4.1356 |
3.0267 | 60.0 | 720 | 4.1474 |
3.0632 | 61.0 | 732 | 4.1193 |
3.3543 | 62.0 | 744 | 4.0760 |
3.519 | 63.0 | 756 | 4.1373 |
3.2546 | 64.0 | 768 | 4.0591 |
3.0835 | 65.0 | 780 | 4.0572 |
3.3228 | 66.0 | 792 | 4.0788 |
3.3441 | 67.0 | 804 | 4.0489 |
2.9186 | 68.0 | 816 | 4.0360 |
3.1519 | 69.0 | 828 | 4.0376 |
3.5119 | 70.0 | 840 | 4.0159 |
3.1155 | 71.0 | 852 | 4.0070 |
3.1899 | 72.0 | 864 | 3.9895 |
3.0979 | 73.0 | 876 | 3.9936 |
3.1709 | 74.0 | 888 | 3.9997 |
3.3529 | 75.0 | 900 | 3.9848 |
2.7989 | 76.0 | 912 | 3.9760 |
3.1918 | 77.0 | 924 | 3.9693 |
2.8472 | 78.0 | 936 | 3.9504 |
3.3493 | 79.0 | 948 | 3.9520 |
3.5098 | 80.0 | 960 | 3.9401 |
3.2381 | 81.0 | 972 | 3.9363 |
3.1959 | 82.0 | 984 | 3.9292 |
3.4514 | 83.0 | 996 | 3.9128 |
2.9119 | 84.0 | 1008 | 3.9194 |
3.2452 | 85.0 | 1020 | 3.9038 |
3.0657 | 86.0 | 1032 | 3.9168 |
2.8583 | 87.0 | 1044 | 3.9018 |
3.2229 | 88.0 | 1056 | 3.9000 |
2.9973 | 89.0 | 1068 | 3.8906 |
3.0533 | 90.0 | 1080 | 3.8818 |
3.3813 | 91.0 | 1092 | 3.8715 |
3.1559 | 92.0 | 1104 | 3.8639 |
3.1343 | 93.0 | 1116 | 3.8674 |
2.9604 | 94.0 | 1128 | 3.8690 |
3.3522 | 95.0 | 1140 | 3.8646 |
2.9739 | 96.0 | 1152 | 3.8589 |
2.7854 | 97.0 | 1164 | 3.8559 |
2.8544 | 98.0 | 1176 | 3.8445 |
2.9875 | 99.0 | 1188 | 3.8434 |
3.3395 | 100.0 | 1200 | 3.8402 |
2.736 | 101.0 | 1212 | 3.8398 |
3.0598 | 102.0 | 1224 | 3.8384 |
3.003 | 103.0 | 1236 | 3.8376 |
3.0566 | 104.0 | 1248 | 3.8386 |
3.1727 | 105.0 | 1260 | 3.8281 |
2.9811 | 106.0 | 1272 | 3.8331 |
2.7108 | 107.0 | 1284 | 3.8224 |
2.6579 | 108.0 | 1296 | 3.8236 |
3.1319 | 109.0 | 1308 | 3.8197 |
3.1115 | 110.0 | 1320 | 3.8216 |
3.0955 | 111.0 | 1332 | 3.8181 |
2.6928 | 112.0 | 1344 | 3.8188 |
2.9943 | 113.0 | 1356 | 3.8147 |
3.0923 | 114.0 | 1368 | 3.8154 |
3.1913 | 115.0 | 1380 | 3.8156 |
2.9444 | 116.0 | 1392 | 3.8146 |
3.0491 | 117.0 | 1404 | 3.8141 |
2.7357 | 118.0 | 1416 | 3.8148 |
3.0744 | 119.0 | 1428 | 3.8148 |
3.1122 | 120.0 | 1440 | 3.8148 |
Framework versions
- Transformers 4.39.1
- Pytorch 2.1.2+cu121
- Datasets 2.16.1
- Tokenizers 0.15.0
- Downloads last month
- 127
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.