train_qnli_1744902610
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the qnli dataset. It achieves the following results on the evaluation set:
- Loss: 0.0352
- Num Input Tokens Seen: 70340640
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.3
- train_batch_size: 4
- eval_batch_size: 4
- seed: 123
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- training_steps: 40000
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
0.1582 | 0.0339 | 200 | 0.1552 | 354016 |
0.1574 | 0.0679 | 400 | 0.1554 | 710048 |
0.1562 | 0.1018 | 600 | 0.1537 | 1061568 |
0.1559 | 0.1358 | 800 | 0.1524 | 1413312 |
0.1612 | 0.1697 | 1000 | 0.1514 | 1761440 |
0.1536 | 0.2037 | 1200 | 0.1505 | 2116800 |
0.1511 | 0.2376 | 1400 | 0.1505 | 2469600 |
0.1503 | 0.2716 | 1600 | 0.1529 | 2820672 |
0.5609 | 0.3055 | 1800 | 0.4726 | 3173888 |
0.1593 | 0.3395 | 2000 | 0.1702 | 3528672 |
0.155 | 0.3734 | 2200 | 0.1545 | 3885024 |
0.1629 | 0.4073 | 2400 | 0.1603 | 4234912 |
0.1614 | 0.4413 | 2600 | 0.1558 | 4585440 |
0.1545 | 0.4752 | 2800 | 0.1607 | 4936320 |
0.1657 | 0.5092 | 3000 | 0.1801 | 5287360 |
0.1561 | 0.5431 | 3200 | 0.1570 | 5634432 |
0.1629 | 0.5771 | 3400 | 0.1636 | 5985504 |
0.1545 | 0.6110 | 3600 | 0.1551 | 6339072 |
0.1599 | 0.6450 | 3800 | 0.1516 | 6695840 |
0.1501 | 0.6789 | 4000 | 0.1550 | 7045536 |
0.1493 | 0.7129 | 4200 | 0.1527 | 7399328 |
0.1552 | 0.7468 | 4400 | 0.1529 | 7749568 |
0.1495 | 0.7808 | 4600 | 0.1503 | 8099584 |
0.149 | 0.8147 | 4800 | 0.1582 | 8450752 |
0.1432 | 0.8486 | 5000 | 0.1493 | 8799616 |
0.1509 | 0.8826 | 5200 | 0.1504 | 9153824 |
0.1391 | 0.9165 | 5400 | 0.1608 | 9503040 |
0.1525 | 0.9505 | 5600 | 0.1512 | 9852032 |
0.1519 | 0.9844 | 5800 | 0.1484 | 10205248 |
0.1496 | 1.0183 | 6000 | 0.1485 | 10556224 |
0.1513 | 1.0523 | 6200 | 0.1561 | 10906176 |
0.1493 | 1.0862 | 6400 | 0.1508 | 11258848 |
0.14 | 1.1202 | 6600 | 0.1570 | 11612736 |
0.1493 | 1.1541 | 6800 | 0.1573 | 11965728 |
0.156 | 1.1881 | 7000 | 0.1511 | 12317792 |
0.1555 | 1.2220 | 7200 | 0.1498 | 12671680 |
0.149 | 1.2560 | 7400 | 0.1472 | 13026528 |
0.1583 | 1.2899 | 7600 | 0.1496 | 13377952 |
0.1623 | 1.3238 | 7800 | 0.1642 | 13731648 |
0.1339 | 1.3578 | 8000 | 0.1561 | 14079456 |
0.1441 | 1.3917 | 8200 | 0.1468 | 14433120 |
0.1615 | 1.4257 | 8400 | 0.1514 | 14785792 |
0.1396 | 1.4596 | 8600 | 0.1514 | 15133600 |
0.1461 | 1.4936 | 8800 | 0.1471 | 15482048 |
0.1612 | 1.5275 | 9000 | 0.1481 | 15833280 |
0.1482 | 1.5615 | 9200 | 0.1418 | 16184384 |
0.1564 | 1.5954 | 9400 | 0.1585 | 16532672 |
0.1478 | 1.6294 | 9600 | 0.1502 | 16886240 |
0.1602 | 1.6633 | 9800 | 0.1507 | 17236032 |
0.1246 | 1.6972 | 10000 | 0.1404 | 17589696 |
0.1334 | 1.7312 | 10200 | 0.1330 | 17939200 |
0.115 | 1.7651 | 10400 | 0.1248 | 18290848 |
0.1109 | 1.7991 | 10600 | 0.1183 | 18643136 |
0.0944 | 1.8330 | 10800 | 0.1122 | 18992096 |
0.0957 | 1.8670 | 11000 | 0.1116 | 19348352 |
0.1179 | 1.9009 | 11200 | 0.1110 | 19697728 |
0.1065 | 1.9349 | 11400 | 0.1125 | 20045504 |
0.1074 | 1.9688 | 11600 | 0.1127 | 20399360 |
0.1213 | 2.0027 | 11800 | 0.1176 | 20752832 |
0.102 | 2.0367 | 12000 | 0.1083 | 21101696 |
0.1289 | 2.0706 | 12200 | 0.1117 | 21451008 |
0.1155 | 2.1046 | 12400 | 0.1072 | 21798496 |
0.1109 | 2.1385 | 12600 | 0.1081 | 22148736 |
0.0942 | 2.1724 | 12800 | 0.1081 | 22497472 |
0.1111 | 2.2064 | 13000 | 0.1063 | 22847840 |
0.1016 | 2.2403 | 13200 | 0.1055 | 23198880 |
0.1004 | 2.2743 | 13400 | 0.1057 | 23551168 |
0.1047 | 2.3082 | 13600 | 0.1182 | 23901824 |
0.115 | 2.3422 | 13800 | 0.1069 | 24252256 |
0.1013 | 2.3761 | 14000 | 0.1086 | 24605280 |
0.1194 | 2.4101 | 14200 | 0.1047 | 24958496 |
0.1023 | 2.4440 | 14400 | 0.1023 | 25308416 |
0.091 | 2.4780 | 14600 | 0.1022 | 25656320 |
0.1265 | 2.5119 | 14800 | 0.1102 | 26010304 |
0.1041 | 2.5458 | 15000 | 0.0987 | 26367744 |
0.1029 | 2.5798 | 15200 | 0.1006 | 26720128 |
0.0986 | 2.6137 | 15400 | 0.1012 | 27068064 |
0.1176 | 2.6477 | 15600 | 0.0997 | 27423584 |
0.0996 | 2.6816 | 15800 | 0.0987 | 27776768 |
0.1052 | 2.7156 | 16000 | 0.0994 | 28126112 |
0.0994 | 2.7495 | 16200 | 0.0980 | 28482048 |
0.0809 | 2.7835 | 16400 | 0.1002 | 28833568 |
0.1015 | 2.8174 | 16600 | 0.1012 | 29184928 |
0.1111 | 2.8514 | 16800 | 0.0947 | 29539168 |
0.0969 | 2.8853 | 17000 | 0.0946 | 29890368 |
0.0958 | 2.9193 | 17200 | 0.0954 | 30246816 |
0.0889 | 2.9532 | 17400 | 0.1013 | 30598112 |
0.1049 | 2.9871 | 17600 | 0.1003 | 30947904 |
0.0809 | 3.0210 | 17800 | 0.0730 | 31297696 |
0.0538 | 3.0550 | 18000 | 0.0743 | 31650784 |
0.0963 | 3.0889 | 18200 | 0.0570 | 32003328 |
0.0387 | 3.1229 | 18400 | 0.0532 | 32350432 |
0.0632 | 3.1568 | 18600 | 0.0577 | 32702560 |
0.0514 | 3.1908 | 18800 | 0.0468 | 33054016 |
0.0488 | 3.2247 | 19000 | 0.0572 | 33410080 |
0.0602 | 3.2587 | 19200 | 0.0471 | 33764032 |
0.0509 | 3.2926 | 19400 | 0.0452 | 34116160 |
0.0564 | 3.3266 | 19600 | 0.0515 | 34470432 |
0.0454 | 3.3605 | 19800 | 0.0455 | 34821536 |
0.0487 | 3.3944 | 20000 | 0.0431 | 35169856 |
0.0533 | 3.4284 | 20200 | 0.0424 | 35520544 |
0.0566 | 3.4623 | 20400 | 0.0433 | 35874144 |
0.0395 | 3.4963 | 20600 | 0.0464 | 36225408 |
0.0505 | 3.5302 | 20800 | 0.0408 | 36573536 |
0.0302 | 3.5642 | 21000 | 0.0477 | 36926144 |
0.0402 | 3.5981 | 21200 | 0.0424 | 37277024 |
0.0421 | 3.6321 | 21400 | 0.0402 | 37630272 |
0.0479 | 3.6660 | 21600 | 0.0390 | 37979008 |
0.0505 | 3.7000 | 21800 | 0.0400 | 38328768 |
0.0312 | 3.7339 | 22000 | 0.0389 | 38679040 |
0.0363 | 3.7679 | 22200 | 0.0388 | 39032192 |
0.0394 | 3.8018 | 22400 | 0.0394 | 39381632 |
0.0319 | 3.8357 | 22600 | 0.0379 | 39732416 |
0.03 | 3.8697 | 22800 | 0.0405 | 40083328 |
0.0442 | 3.9036 | 23000 | 0.0397 | 40439264 |
0.0328 | 3.9376 | 23200 | 0.0386 | 40789056 |
0.0274 | 3.9715 | 23400 | 0.0449 | 41141280 |
0.0232 | 4.0054 | 23600 | 0.0377 | 41495616 |
0.0216 | 4.0394 | 23800 | 0.0373 | 41845216 |
0.0521 | 4.0733 | 24000 | 0.0384 | 42198656 |
0.0402 | 4.1073 | 24200 | 0.0394 | 42548064 |
0.0231 | 4.1412 | 24400 | 0.0381 | 42897248 |
0.0282 | 4.1752 | 24600 | 0.0373 | 43253728 |
0.0225 | 4.2091 | 24800 | 0.0389 | 43608032 |
0.0163 | 4.2431 | 25000 | 0.0375 | 43958240 |
0.0381 | 4.2770 | 25200 | 0.0366 | 44310560 |
0.0464 | 4.3109 | 25400 | 0.0382 | 44662688 |
0.0234 | 4.3449 | 25600 | 0.0369 | 45016000 |
0.0522 | 4.3788 | 25800 | 0.0387 | 45365856 |
0.0318 | 4.4128 | 26000 | 0.0396 | 45716576 |
0.0582 | 4.4467 | 26200 | 0.0380 | 46068320 |
0.0317 | 4.4807 | 26400 | 0.0392 | 46416928 |
0.0519 | 4.5146 | 26600 | 0.0372 | 46771968 |
0.0164 | 4.5486 | 26800 | 0.0361 | 47123552 |
0.0172 | 4.5825 | 27000 | 0.0359 | 47476256 |
0.0264 | 4.6165 | 27200 | 0.0381 | 47831136 |
0.0413 | 4.6504 | 27400 | 0.0361 | 48181856 |
0.0268 | 4.6843 | 27600 | 0.0363 | 48531648 |
0.0296 | 4.7183 | 27800 | 0.0358 | 48881728 |
0.0398 | 4.7522 | 28000 | 0.0358 | 49229248 |
0.0183 | 4.7862 | 28200 | 0.0362 | 49577952 |
0.0283 | 4.8201 | 28400 | 0.0371 | 49930752 |
0.0406 | 4.8541 | 28600 | 0.0355 | 50282304 |
0.0341 | 4.8880 | 28800 | 0.0358 | 50635840 |
0.0484 | 4.9220 | 29000 | 0.0364 | 50990240 |
0.0458 | 4.9559 | 29200 | 0.0357 | 51342976 |
0.0386 | 4.9899 | 29400 | 0.0354 | 51696320 |
0.0312 | 5.0238 | 29600 | 0.0363 | 52045952 |
0.0099 | 5.0577 | 29800 | 0.0366 | 52399008 |
0.0191 | 5.0917 | 30000 | 0.0355 | 52748704 |
0.023 | 5.1256 | 30200 | 0.0361 | 53098368 |
0.0704 | 5.1595 | 30400 | 0.0369 | 53449792 |
0.0313 | 5.1935 | 30600 | 0.0364 | 53800640 |
0.0213 | 5.2274 | 30800 | 0.0369 | 54151264 |
0.0128 | 5.2614 | 31000 | 0.0360 | 54498144 |
0.0319 | 5.2953 | 31200 | 0.0358 | 54846400 |
0.0283 | 5.3293 | 31400 | 0.0357 | 55200448 |
0.0499 | 5.3632 | 31600 | 0.0368 | 55550048 |
0.027 | 5.3972 | 31800 | 0.0368 | 55901856 |
0.0328 | 5.4311 | 32000 | 0.0356 | 56259904 |
0.0328 | 5.4651 | 32200 | 0.0357 | 56615008 |
0.0373 | 5.4990 | 32400 | 0.0355 | 56965760 |
0.0513 | 5.5329 | 32600 | 0.0353 | 57316960 |
0.0297 | 5.5669 | 32800 | 0.0365 | 57670080 |
0.0312 | 5.6008 | 33000 | 0.0357 | 58024256 |
0.0533 | 5.6348 | 33200 | 0.0356 | 58378976 |
0.045 | 5.6687 | 33400 | 0.0359 | 58733184 |
0.0473 | 5.7027 | 33600 | 0.0353 | 59085760 |
0.0421 | 5.7366 | 33800 | 0.0355 | 59438720 |
0.0118 | 5.7706 | 34000 | 0.0359 | 59794048 |
0.0422 | 5.8045 | 34200 | 0.0355 | 60144576 |
0.0266 | 5.8385 | 34400 | 0.0359 | 60495264 |
0.0303 | 5.8724 | 34600 | 0.0352 | 60843616 |
0.0149 | 5.9064 | 34800 | 0.0355 | 61196096 |
0.0692 | 5.9403 | 35000 | 0.0354 | 61549696 |
0.0338 | 5.9742 | 35200 | 0.0359 | 61901760 |
0.0193 | 6.0081 | 35400 | 0.0359 | 62248640 |
0.0167 | 6.0421 | 35600 | 0.0359 | 62595488 |
0.0324 | 6.0760 | 35800 | 0.0360 | 62948736 |
0.0302 | 6.1100 | 36000 | 0.0361 | 63302496 |
0.0266 | 6.1439 | 36200 | 0.0360 | 63654144 |
0.0082 | 6.1779 | 36400 | 0.0365 | 64010336 |
0.0253 | 6.2118 | 36600 | 0.0363 | 64362880 |
0.0196 | 6.2458 | 36800 | 0.0364 | 64717120 |
0.0193 | 6.2797 | 37000 | 0.0364 | 65067680 |
0.044 | 6.3137 | 37200 | 0.0363 | 65417376 |
0.0323 | 6.3476 | 37400 | 0.0361 | 65768416 |
0.0154 | 6.3816 | 37600 | 0.0361 | 66122624 |
0.0165 | 6.4155 | 37800 | 0.0361 | 66474048 |
0.018 | 6.4494 | 38000 | 0.0362 | 66825760 |
0.0124 | 6.4834 | 38200 | 0.0362 | 67179008 |
0.0118 | 6.5173 | 38400 | 0.0362 | 67533344 |
0.012 | 6.5513 | 38600 | 0.0362 | 67884864 |
0.0305 | 6.5852 | 38800 | 0.0362 | 68234656 |
0.0288 | 6.6192 | 39000 | 0.0362 | 68586432 |
0.0347 | 6.6531 | 39200 | 0.0362 | 68938688 |
0.0383 | 6.6871 | 39400 | 0.0362 | 69288384 |
0.0144 | 6.7210 | 39600 | 0.0361 | 69637472 |
0.011 | 6.7550 | 39800 | 0.0362 | 69989056 |
0.0187 | 6.7889 | 40000 | 0.0362 | 70340640 |
Framework versions
- PEFT 0.15.1
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for rbelanec/train_qnli_1744902610
Base model
meta-llama/Meta-Llama-3-8B-Instruct