train_multirc_1745950264

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the multirc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3193
  • Num Input Tokens Seen: 75778784

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.3
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 40000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4564 0.0326 200 0.4853 378944
0.3723 0.0653 400 0.3704 758192
0.3559 0.0979 600 0.3773 1141408
0.3491 0.1305 800 0.3604 1518336
0.3909 0.1631 1000 0.3607 1901264
0.3915 0.1958 1200 0.3488 2279552
0.3655 0.2284 1400 0.3567 2668256
0.375 0.2610 1600 0.3505 3047328
0.4324 0.2937 1800 0.4502 3429984
0.3761 0.3263 2000 0.4696 3814576
0.3441 0.3589 2200 0.3639 4190352
0.3423 0.3915 2400 0.3468 4567440
0.3596 0.4242 2600 0.3570 4944384
0.3747 0.4568 2800 0.3450 5325216
0.3322 0.4894 3000 0.4611 5698896
0.3631 0.5221 3200 0.3524 6074432
0.357 0.5547 3400 0.3523 6454208
0.3337 0.5873 3600 0.3636 6831056
0.3765 0.6200 3800 0.3738 7209536
0.3645 0.6526 4000 0.4008 7593024
0.3501 0.6852 4200 0.3490 7977072
0.3648 0.7178 4400 0.3519 8353296
0.3577 0.7505 4600 0.3439 8733232
0.3691 0.7831 4800 0.3404 9113632
0.3779 0.8157 5000 0.3713 9487952
0.3499 0.8484 5200 0.3861 9861104
0.3244 0.8810 5400 0.3939 10239088
0.3561 0.9136 5600 0.3590 10619840
0.4045 0.9462 5800 0.3432 10994720
0.3709 0.9789 6000 0.3536 11376976
0.3859 1.0114 6200 0.3500 11758656
0.4941 1.0440 6400 0.3453 12144016
0.373 1.0767 6600 0.3695 12531776
0.3413 1.1093 6800 0.3387 12905136
0.4019 1.1419 7000 0.3544 13278096
0.3835 1.1746 7200 0.4184 13651520
0.3365 1.2072 7400 0.3427 14034784
0.3167 1.2398 7600 0.3402 14415120
0.3442 1.2725 7800 0.3339 14794784
0.3655 1.3051 8000 0.3357 15176240
0.3667 1.3377 8200 0.3359 15548080
0.3315 1.3703 8400 0.3355 15926832
0.3966 1.4030 8600 0.3776 16305344
0.3021 1.4356 8800 0.3357 16686528
0.3295 1.4682 9000 0.3456 17073648
0.3563 1.5009 9200 0.3692 17457952
0.3424 1.5335 9400 0.3505 17831104
0.3343 1.5661 9600 0.3346 18215168
0.3255 1.5987 9800 0.3486 18592816
0.3551 1.6314 10000 0.3564 18972864
0.3223 1.6640 10200 0.3309 19350160
0.3489 1.6966 10400 0.3513 19735024
0.2748 1.7293 10600 0.3290 20108768
0.3823 1.7619 10800 0.3371 20489424
0.3643 1.7945 11000 0.3394 20870832
0.3802 1.8271 11200 0.3317 21240960
0.3923 1.8598 11400 0.3403 21615744
0.3803 1.8924 11600 0.3289 21991984
0.3738 1.9250 11800 0.3350 22366624
0.3144 1.9577 12000 0.3476 22746000
0.3205 1.9903 12200 0.3514 23122688
0.3868 2.0228 12400 0.3390 23494112
0.3732 2.0555 12600 0.3319 23876160
0.3627 2.0881 12800 0.3286 24261904
0.3366 2.1207 13000 0.3427 24643776
0.3541 2.1534 13200 0.3607 25020496
0.3233 2.1860 13400 0.3272 25391072
0.3341 2.2186 13600 0.3402 25762416
0.3562 2.2512 13800 0.3296 26139456
0.3395 2.2839 14000 0.3311 26511344
0.3766 2.3165 14200 0.3433 26891616
0.3558 2.3491 14400 0.3296 27274960
0.3056 2.3818 14600 0.4714 27652224
0.3646 2.4144 14800 0.3235 28033168
0.3433 2.4470 15000 0.3337 28414784
0.3656 2.4796 15200 0.3255 28787168
0.3642 2.5123 15400 0.3283 29164512
0.3603 2.5449 15600 0.3529 29545056
0.3109 2.5775 15800 0.3305 29922176
0.3171 2.6102 16000 0.3256 30304336
0.3595 2.6428 16200 0.3384 30688608
0.2938 2.6754 16400 0.3256 31067744
0.3434 2.7081 16600 0.3326 31455328
0.3638 2.7407 16800 0.3246 31833136
0.3623 2.7733 17000 0.3232 32213296
0.3703 2.8059 17200 0.3346 32588128
0.3248 2.8386 17400 0.3223 32971552
0.3379 2.8712 17600 0.3280 33356064
0.3434 2.9038 17800 0.3244 33739984
0.288 2.9365 18000 0.3247 34121824
0.3275 2.9691 18200 0.3452 34498368
0.2901 3.0016 18400 0.3241 34866272
0.3238 3.0343 18600 0.3236 35258768
0.3617 3.0669 18800 0.3654 35644416
0.3463 3.0995 19000 0.3730 36017808
0.3218 3.1321 19200 0.3243 36393536
0.3652 3.1648 19400 0.3416 36770432
0.3161 3.1974 19600 0.3264 37152448
0.3487 3.2300 19800 0.3428 37532496
0.3284 3.2627 20000 0.3321 37910480
0.3752 3.2953 20200 0.3265 38286080
0.3216 3.3279 20400 0.3320 38664512
0.3357 3.3606 20600 0.3293 39053472
0.3281 3.3932 20800 0.3502 39432032
0.3459 3.4258 21000 0.3226 39812704
0.3314 3.4584 21200 0.3277 40191088
0.422 3.4911 21400 0.3282 40567216
0.3273 3.5237 21600 0.3569 40947696
0.2933 3.5563 21800 0.3224 41330624
0.4252 3.5890 22000 0.3256 41708800
0.3007 3.6216 22200 0.3229 42087824
0.324 3.6542 22400 0.3247 42461936
0.3796 3.6868 22600 0.3240 42843696
0.2865 3.7195 22800 0.3220 43221120
0.3379 3.7521 23000 0.3212 43597776
0.3036 3.7847 23200 0.3242 43979312
0.3774 3.8174 23400 0.3328 44354480
0.3691 3.8500 23600 0.3250 44727696
0.361 3.8826 23800 0.3240 45108608
0.3729 3.9152 24000 0.3245 45482928
0.3025 3.9479 24200 0.3228 45861584
0.3069 3.9805 24400 0.3221 46243072
0.338 4.0131 24600 0.3205 46619680
0.3089 4.0457 24800 0.3210 47007360
0.2882 4.0783 25000 0.3292 47391600
0.3406 4.1109 25200 0.3224 47768320
0.3066 4.1436 25400 0.3206 48143424
0.4355 4.1762 25600 0.3240 48524368
0.3275 4.2088 25800 0.3205 48899856
0.2758 4.2415 26000 0.3293 49280208
0.3199 4.2741 26200 0.3263 49658080
0.3332 4.3067 26400 0.3221 50034848
0.3795 4.3393 26600 0.3269 50413376
0.3959 4.3720 26800 0.3216 50793248
0.3603 4.4046 27000 0.3215 51170976
0.321 4.4372 27200 0.3250 51559504
0.3445 4.4699 27400 0.3210 51928704
0.3458 4.5025 27600 0.3215 52297776
0.2784 4.5351 27800 0.3208 52669472
0.3428 4.5677 28000 0.3275 53045856
0.3063 4.6004 28200 0.3216 53429232
0.2983 4.6330 28400 0.3279 53810560
0.319 4.6656 28600 0.3227 54191536
0.3985 4.6983 28800 0.3220 54572176
0.3425 4.7309 29000 0.3200 54952896
0.236 4.7635 29200 0.3214 55327776
0.3071 4.7961 29400 0.3208 55708896
0.3007 4.8288 29600 0.3212 56085712
0.3351 4.8614 29800 0.3227 56467376
0.3147 4.8940 30000 0.3314 56841328
0.3661 4.9267 30200 0.3220 57227184
0.2419 4.9593 30400 0.3258 57605632
0.3353 4.9919 30600 0.3199 57987472
0.335 5.0245 30800 0.3206 58367056
0.3406 5.0571 31000 0.3249 58746720
0.3194 5.0897 31200 0.3243 59124272
0.3977 5.1224 31400 0.3198 59504688
0.26 5.1550 31600 0.3204 59875840
0.3123 5.1876 31800 0.3194 60247360
0.3155 5.2202 32000 0.3232 60622464
0.3201 5.2529 32200 0.3215 61006768
0.3427 5.2855 32400 0.3200 61386992
0.3274 5.3181 32600 0.3216 61770000
0.3653 5.3508 32800 0.3199 62154640
0.2958 5.3834 33000 0.3205 62541664
0.2897 5.4160 33200 0.3193 62912976
0.3029 5.4486 33400 0.3222 63289520
0.3473 5.4813 33600 0.3204 63668416
0.2732 5.5139 33800 0.3205 64043792
0.3241 5.5465 34000 0.3200 64433840
0.2618 5.5792 34200 0.3214 64808624
0.2975 5.6118 34400 0.3199 65182704
0.326 5.6444 34600 0.3209 65562192
0.3478 5.6771 34800 0.3232 65940816
0.3652 5.7097 35000 0.3198 66326768
0.3207 5.7423 35200 0.3226 66705744
0.3184 5.7749 35400 0.3231 67084928
0.3183 5.8076 35600 0.3237 67462064
0.2891 5.8402 35800 0.3237 67846112
0.3224 5.8728 36000 0.3218 68221552
0.2998 5.9055 36200 0.3216 68606416
0.3256 5.9381 36400 0.3235 68980176
0.3077 5.9707 36600 0.3209 69349984
0.2706 6.0033 36800 0.3216 69729984
0.3618 6.0359 37000 0.3231 70107936
0.3143 6.0685 37200 0.3212 70487856
0.3113 6.1012 37400 0.3203 70865792
0.2601 6.1338 37600 0.3201 71244784
0.3489 6.1664 37800 0.3193 71630704
0.2823 6.1990 38000 0.3195 72002688
0.3585 6.2317 38200 0.3198 72385776
0.2892 6.2643 38400 0.3207 72773152
0.3261 6.2969 38600 0.3205 73149584
0.274 6.3296 38800 0.3197 73519536
0.2749 6.3622 39000 0.3201 73902896
0.3728 6.3948 39200 0.3206 74278960
0.3704 6.4274 39400 0.3205 74655728
0.2959 6.4601 39600 0.3202 75025808
0.321 6.4927 39800 0.3202 75402576
0.2811 6.5253 40000 0.3206 75778784

Framework versions

  • PEFT 0.15.2.dev0
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_multirc_1745950264

Adapter
(973)
this model