train_mnli_1744902585

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mnli dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3357
  • Num Input Tokens Seen: 62984280

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.3
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 40000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.6626 0.0091 200 0.6524 312896
0.7724 0.0181 400 0.6093 625472
0.3785 0.0272 600 0.3805 942656
0.4347 0.0362 800 0.3784 1256992
0.4175 0.0453 1000 0.3748 1572864
0.4516 0.0543 1200 0.3609 1889696
0.4132 0.0634 1400 0.4343 2203360
0.3945 0.0724 1600 0.3904 2524096
0.3702 0.0815 1800 0.4047 2837312
0.4426 0.0905 2000 0.4006 3152992
0.4405 0.0996 2200 0.4356 3466976
0.3683 0.1086 2400 0.3999 3784000
0.3224 0.1177 2600 0.4874 4100288
0.3747 0.1268 2800 0.3687 4417024
0.3558 0.1358 3000 0.3646 4730880
0.4067 0.1449 3200 0.3642 5046976
0.3608 0.1539 3400 0.4002 5361952
0.3815 0.1630 3600 0.3647 5680768
0.3862 0.1720 3800 0.4574 5996256
0.3587 0.1811 4000 0.3627 6311552
0.331 0.1901 4200 0.3791 6627776
0.335 0.1992 4400 0.3430 6946240
0.3485 0.2082 4600 0.3873 7260672
0.3224 0.2173 4800 0.3450 7574432
0.3566 0.2264 5000 0.3479 7890496
0.3319 0.2354 5200 0.3551 8202528
0.354 0.2445 5400 0.3949 8516928
0.3437 0.2535 5600 0.3447 8828000
0.3466 0.2626 5800 0.3427 9143776
0.3727 0.2716 6000 0.3672 9456800
0.3587 0.2807 6200 0.3495 9770496
0.3593 0.2897 6400 0.3426 10084544
0.3299 0.2988 6600 0.3832 10400832
0.3538 0.3078 6800 0.3558 10713664
0.3505 0.3169 7000 0.3447 11028672
0.3126 0.3259 7200 0.3760 11347104
0.3644 0.3350 7400 0.3538 11658304
0.3468 0.3441 7600 0.3482 11969312
0.328 0.3531 7800 0.3524 12283264
0.3273 0.3622 8000 0.3460 12595776
0.3537 0.3712 8200 0.3523 12911104
0.3386 0.3803 8400 0.3545 13225632
0.3321 0.3893 8600 0.3431 13544096
0.3541 0.3984 8800 0.3878 13857600
0.3815 0.4074 9000 0.3890 14172800
0.3723 0.4165 9200 0.3529 14487680
0.3816 0.4255 9400 0.3597 14807520
0.3156 0.4346 9600 0.3811 15117696
0.3731 0.4436 9800 0.3394 15433344
0.3529 0.4527 10000 0.3460 15748576
0.3473 0.4618 10200 0.3793 16064864
0.3345 0.4708 10400 0.3443 16386496
0.3708 0.4799 10600 0.3469 16700128
0.3233 0.4889 10800 0.3567 17015072
0.3279 0.4980 11000 0.3438 17334080
0.342 0.5070 11200 0.3467 17650336
0.3544 0.5161 11400 0.3380 17964032
0.371 0.5251 11600 0.3514 18280704
0.3684 0.5342 11800 0.3545 18595744
0.3302 0.5432 12000 0.3421 18906592
0.3526 0.5523 12200 0.3444 19223392
0.3347 0.5614 12400 0.3411 19535520
0.3183 0.5704 12600 0.3476 19848032
0.3117 0.5795 12800 0.3772 20163616
0.3384 0.5885 13000 0.3389 20479520
0.3399 0.5976 13200 0.3385 20792320
0.3515 0.6066 13400 0.3461 21105472
0.3329 0.6157 13600 0.3434 21418912
0.3379 0.6247 13800 0.3448 21740320
0.3307 0.6338 14000 0.3430 22051936
0.3258 0.6428 14200 0.3412 22365376
0.34 0.6519 14400 0.3407 22680000
0.3718 0.6609 14600 0.3461 22995520
0.3377 0.6700 14800 0.3446 23311072
0.3466 0.6791 15000 0.3420 23626112
0.3481 0.6881 15200 0.3475 23937568
0.3298 0.6972 15400 0.3558 24253504
0.3411 0.7062 15600 0.3427 24568160
0.3495 0.7153 15800 0.3555 24882112
0.3408 0.7243 16000 0.3423 25201792
0.3424 0.7334 16200 0.3400 25518176
0.3365 0.7424 16400 0.3473 25832000
0.3271 0.7515 16600 0.3421 26142144
0.3369 0.7605 16800 0.3384 26458432
0.3239 0.7696 17000 0.3389 26771360
0.3311 0.7787 17200 0.3452 27085568
0.3313 0.7877 17400 0.3452 27401344
0.3731 0.7968 17600 0.3483 27721120
0.3405 0.8058 17800 0.3634 28035200
0.3366 0.8149 18000 0.3502 28351968
0.3289 0.8239 18200 0.3402 28668224
0.3341 0.8330 18400 0.3366 28981824
0.3394 0.8420 18600 0.3393 29293792
0.3361 0.8511 18800 0.3393 29608320
0.3442 0.8601 19000 0.3450 29922016
0.3339 0.8692 19200 0.3402 30237280
0.3202 0.8782 19400 0.3397 30550560
0.3519 0.8873 19600 0.3416 30861952
0.3468 0.8964 19800 0.3371 31176736
0.3444 0.9054 20000 0.3381 31490688
0.3351 0.9145 20200 0.3398 31805440
0.333 0.9235 20400 0.3397 32120672
0.338 0.9326 20600 0.3461 32434592
0.3155 0.9416 20800 0.3606 32746528
0.3369 0.9507 21000 0.3383 33062880
0.3416 0.9597 21200 0.3434 33380032
0.3255 0.9688 21400 0.3370 33698368
0.3502 0.9778 21600 0.3454 34015424
0.3438 0.9869 21800 0.3385 34331520
0.3429 0.9959 22000 0.3403 34642688
0.3284 1.0050 22200 0.3369 34959928
0.3496 1.0140 22400 0.3385 35273880
0.3439 1.0231 22600 0.3438 35587832
0.3232 1.0321 22800 0.3544 35899672
0.3458 1.0412 23000 0.3367 36212824
0.3235 1.0503 23200 0.3412 36528792
0.3366 1.0593 23400 0.3433 36844024
0.3171 1.0684 23600 0.3484 37157784
0.3393 1.0774 23800 0.3376 37469272
0.3228 1.0865 24000 0.3393 37785112
0.3421 1.0955 24200 0.3407 38101496
0.3278 1.1046 24400 0.3367 38418456
0.3339 1.1136 24600 0.3363 38735256
0.3269 1.1227 24800 0.3373 39051640
0.3165 1.1317 25000 0.3460 39365176
0.3314 1.1408 25200 0.3363 39684408
0.3261 1.1498 25400 0.3382 40000056
0.3926 1.1589 25600 0.3387 40316632
0.3271 1.1680 25800 0.3372 40629528
0.3322 1.1770 26000 0.3377 40944536
0.3308 1.1861 26200 0.3392 41261208
0.3364 1.1951 26400 0.3383 41575992
0.3509 1.2042 26600 0.3374 41888504
0.3479 1.2132 26800 0.3377 42202072
0.3414 1.2223 27000 0.3382 42518168
0.3194 1.2313 27200 0.3417 42833560
0.3444 1.2404 27400 0.3376 43144152
0.3317 1.2494 27600 0.3368 43457272
0.3299 1.2585 27800 0.3372 43774104
0.3497 1.2675 28000 0.3415 44088120
0.3222 1.2766 28200 0.3422 44401112
0.3442 1.2857 28400 0.3384 44718232
0.3279 1.2947 28600 0.3383 45031416
0.3401 1.3038 28800 0.3385 45340984
0.3375 1.3128 29000 0.3359 45659256
0.3562 1.3219 29200 0.3362 45975384
0.3654 1.3309 29400 0.3392 46290296
0.3331 1.3400 29600 0.3375 46604312
0.3359 1.3490 29800 0.3362 46919192
0.348 1.3581 30000 0.3371 47236440
0.349 1.3671 30200 0.3418 47550744
0.3208 1.3762 30400 0.3362 47865912
0.3381 1.3853 30600 0.3363 48183992
0.3078 1.3943 30800 0.3402 48495160
0.3325 1.4034 31000 0.3371 48813176
0.3285 1.4124 31200 0.3373 49129080
0.3363 1.4215 31400 0.3372 49444664
0.3384 1.4305 31600 0.3361 49756312
0.3271 1.4396 31800 0.3365 50068088
0.3304 1.4486 32000 0.3371 50382136
0.3235 1.4577 32200 0.3377 50700344
0.3493 1.4667 32400 0.3360 51012696
0.3167 1.4758 32600 0.3391 51328696
0.3436 1.4848 32800 0.3364 51641752
0.3476 1.4939 33000 0.3390 51954840
0.3266 1.5030 33200 0.3373 52269720
0.3463 1.5120 33400 0.3399 52585784
0.3401 1.5211 33600 0.3361 52898904
0.3208 1.5301 33800 0.3369 53217208
0.33 1.5392 34000 0.3370 53532408
0.3347 1.5482 34200 0.3365 53849208
0.3296 1.5573 34400 0.3369 54166040
0.3224 1.5663 34600 0.3363 54482232
0.3293 1.5754 34800 0.3365 54797880
0.342 1.5844 35000 0.3362 55112536
0.333 1.5935 35200 0.3361 55427928
0.32 1.6025 35400 0.3361 55741912
0.3627 1.6116 35600 0.3363 56057048
0.3296 1.6207 35800 0.3369 56371640
0.3282 1.6297 36000 0.3361 56683896
0.3493 1.6388 36200 0.3359 57003192
0.3369 1.6478 36400 0.3369 57318104
0.3346 1.6569 36600 0.3363 57632152
0.3272 1.6659 36800 0.3365 57948856
0.3323 1.6750 37000 0.3362 58266232
0.3297 1.6840 37200 0.3360 58583544
0.3258 1.6931 37400 0.3361 58903288
0.3133 1.7021 37600 0.3374 59218296
0.3205 1.7112 37800 0.3357 59533240
0.3225 1.7203 38000 0.3364 59848664
0.3337 1.7293 38200 0.3366 60164984
0.3303 1.7384 38400 0.3364 60478328
0.3251 1.7474 38600 0.3364 60787576
0.334 1.7565 38800 0.3366 61097848
0.3268 1.7655 39000 0.3365 61413432
0.3426 1.7746 39200 0.3362 61727320
0.3281 1.7836 39400 0.3364 62041848
0.3122 1.7927 39600 0.3363 62358168
0.3117 1.8017 39800 0.3363 62670392
0.3362 1.8108 40000 0.3363 62984280

Framework versions

  • PEFT 0.15.1
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_mnli_1744902585

Adapter
(973)
this model