train_sst2_1744902626

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.3 on the sst2 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3423
  • Num Input Tokens Seen: 33458560

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.3
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 40000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3637 0.0528 200 0.3444 166688
0.3455 0.1056 400 0.3437 334048
0.3695 0.1584 600 0.3603 500448
0.3481 0.2112 800 0.3463 667872
0.3391 0.2640 1000 0.3454 834848
0.3507 0.3167 1200 0.3469 1002816
0.3419 0.3695 1400 0.3456 1169088
0.3257 0.4223 1600 0.3657 1337088
0.3429 0.4751 1800 0.3433 1505536
0.3504 0.5279 2000 0.3490 1673024
0.3429 0.5807 2200 0.3441 1842304
0.3464 0.6335 2400 0.3458 2007328
0.351 0.6863 2600 0.3440 2174880
0.3433 0.7391 2800 0.3451 2341280
0.3421 0.7919 3000 0.3450 2509440
0.3478 0.8447 3200 0.3435 2674784
0.3415 0.8975 3400 0.3462 2843680
0.3468 0.9502 3600 0.3461 3011904
0.3399 1.0029 3800 0.3438 3178064
0.3459 1.0557 4000 0.3573 3345904
0.345 1.1085 4200 0.3439 3514608
0.3406 1.1613 4400 0.3457 3680560
0.3453 1.2141 4600 0.3503 3849328
0.3254 1.2669 4800 0.3537 4017200
0.3425 1.3197 5000 0.3441 4187184
0.3492 1.3724 5200 0.3441 4354416
0.3434 1.4252 5400 0.3433 4519856
0.3464 1.4780 5600 0.3441 4687280
0.3317 1.5308 5800 0.3459 4856112
0.3447 1.5836 6000 0.3432 5022736
0.3406 1.6364 6200 0.3579 5188656
0.3669 1.6892 6400 0.3445 5356208
0.3639 1.7420 6600 0.3451 5523952
0.3498 1.7948 6800 0.3482 5690672
0.343 1.8476 7000 0.3465 5857072
0.3323 1.9004 7200 0.3451 6024976
0.3487 1.9531 7400 0.3530 6191664
0.3554 2.0058 7600 0.3564 6357472
0.3597 2.0586 7800 0.3433 6525984
0.3421 2.1114 8000 0.3443 6692320
0.3354 2.1642 8200 0.3465 6860064
0.3402 2.2170 8400 0.3457 7026528
0.3265 2.2698 8600 0.3502 7192384
0.3434 2.3226 8800 0.3432 7358816
0.3501 2.3753 9000 0.3467 7526496
0.3335 2.4281 9200 0.3440 7696064
0.3457 2.4809 9400 0.3555 7863456
0.3506 2.5337 9600 0.3517 8031776
0.3415 2.5865 9800 0.3464 8199584
0.347 2.6393 10000 0.3493 8366016
0.3532 2.6921 10200 0.3436 8531808
0.3483 2.7449 10400 0.3497 8702976
0.3338 2.7977 10600 0.3441 8870944
0.332 2.8505 10800 0.3435 9039680
0.3506 2.9033 11000 0.3500 9206880
0.3434 2.9561 11200 0.3450 9372128
0.3402 3.0087 11400 0.3442 9538768
0.3277 3.0615 11600 0.3434 9705232
0.3479 3.1143 11800 0.3466 9871632
0.314 3.1671 12000 0.3493 10039472
0.324 3.2199 12200 0.3525 10206320
0.342 3.2727 12400 0.3447 10376240
0.356 3.3255 12600 0.3428 10544464
0.3513 3.3782 12800 0.3431 10712240
0.347 3.4310 13000 0.3439 10879120
0.3321 3.4838 13200 0.3434 11045072
0.3421 3.5366 13400 0.3449 11211312
0.3401 3.5894 13600 0.3432 11378128
0.334 3.6422 13800 0.3432 11544592
0.3435 3.6950 14000 0.3430 11713040
0.3444 3.7478 14200 0.3442 11880432
0.346 3.8006 14400 0.3490 12048176
0.3442 3.8534 14600 0.3438 12215792
0.3477 3.9062 14800 0.3530 12383792
0.3437 3.9590 15000 0.3434 12549680
0.3473 4.0116 15200 0.3430 12716448
0.3526 4.0644 15400 0.3430 12882752
0.331 4.1172 15600 0.3441 13051200
0.3377 4.1700 15800 0.3447 13217024
0.3238 4.2228 16000 0.3485 13382784
0.3572 4.2756 16200 0.3460 13549216
0.3457 4.3284 16400 0.3428 13719072
0.3496 4.3812 16600 0.3453 13884928
0.3389 4.4339 16800 0.3475 14051584
0.3379 4.4867 17000 0.3435 14220704
0.3461 4.5395 17200 0.3442 14387008
0.3119 4.5923 17400 0.3498 14555808
0.3377 4.6451 17600 0.3429 14723456
0.3457 4.6979 17800 0.3457 14890880
0.3371 4.7507 18000 0.3449 15059744
0.3329 4.8035 18200 0.3441 15224512
0.3479 4.8563 18400 0.3442 15392960
0.3539 4.9091 18600 0.3427 15561696
0.3553 4.9619 18800 0.3432 15728800
0.3432 5.0145 19000 0.3435 15897552
0.3556 5.0673 19200 0.3511 16064688
0.3497 5.1201 19400 0.3439 16231120
0.3335 5.1729 19600 0.3445 16397744
0.3646 5.2257 19800 0.3435 16564176
0.325 5.2785 20000 0.3429 16731600
0.3426 5.3313 20200 0.3434 16898064
0.3327 5.3841 20400 0.3523 17064080
0.342 5.4368 20600 0.3433 17231888
0.332 5.4896 20800 0.3440 17399184
0.3488 5.5424 21000 0.3437 17566160
0.3348 5.5952 21200 0.3486 17732304
0.3329 5.6480 21400 0.3519 17900880
0.3423 5.7008 21600 0.3440 18070192
0.3557 5.7536 21800 0.3533 18237168
0.3464 5.8064 22000 0.3430 18403856
0.3437 5.8592 22200 0.3446 18571248
0.3373 5.9120 22400 0.3451 18738672
0.3372 5.9648 22600 0.3452 18905744
0.3675 6.0174 22800 0.3441 19073440
0.3512 6.0702 23000 0.3428 19241920
0.3418 6.1230 23200 0.3427 19409408
0.3412 6.1758 23400 0.3446 19577024
0.3488 6.2286 23600 0.3429 19744608
0.338 6.2814 23800 0.3474 19911488
0.3409 6.3342 24000 0.3443 20078944
0.3375 6.3870 24200 0.3438 20244928
0.3404 6.4398 24400 0.3426 20411232
0.3424 6.4925 24600 0.3428 20578080
0.3417 6.5453 24800 0.3445 20746592
0.3544 6.5981 25000 0.3440 20913344
0.3352 6.6509 25200 0.3441 21081952
0.3495 6.7037 25400 0.3426 21248384
0.3422 6.7565 25600 0.3430 21415872
0.344 6.8093 25800 0.3432 21584000
0.3383 6.8621 26000 0.3427 21751168
0.3391 6.9149 26200 0.3436 21918816
0.343 6.9677 26400 0.3428 22084384
0.3383 7.0203 26600 0.3433 22251776
0.3386 7.0731 26800 0.3442 22418080
0.3427 7.1259 27000 0.3434 22587392
0.3728 7.1787 27200 0.3436 22753056
0.3759 7.2315 27400 0.3459 22920768
0.3408 7.2843 27600 0.3437 23087296
0.3329 7.3371 27800 0.3430 23254400
0.3349 7.3899 28000 0.3427 23422752
0.3478 7.4427 28200 0.3424 23588352
0.3361 7.4954 28400 0.3444 23755840
0.3314 7.5482 28600 0.3429 23923680
0.3451 7.6010 28800 0.3428 24091168
0.3328 7.6538 29000 0.3431 24258016
0.3383 7.7066 29200 0.3430 24427808
0.3302 7.7594 29400 0.3443 24596288
0.3389 7.8122 29600 0.3429 24764192
0.3221 7.8650 29800 0.3473 24932000
0.3373 7.9178 30000 0.3434 25100224
0.3454 7.9706 30200 0.3429 25267808
0.3342 8.0232 30400 0.3437 25433440
0.3329 8.0760 30600 0.3432 25600672
0.3287 8.1288 30800 0.3436 25769408
0.3559 8.1816 31000 0.3429 25936160
0.348 8.2344 31200 0.3430 26103744
0.3229 8.2872 31400 0.3457 26270560
0.3382 8.3400 31600 0.3426 26437536
0.3525 8.3928 31800 0.3434 26604480
0.3446 8.4456 32000 0.3433 26771680
0.3464 8.4984 32200 0.3428 26940256
0.3424 8.5511 32400 0.3431 27107680
0.3333 8.6039 32600 0.3438 27274048
0.3384 8.6567 32800 0.3427 27440544
0.3515 8.7095 33000 0.3430 27608000
0.3536 8.7623 33200 0.3443 27776704
0.3475 8.8151 33400 0.3427 27942752
0.353 8.8679 33600 0.3428 28108864
0.3371 8.9207 33800 0.3425 28275296
0.3487 8.9735 34000 0.3428 28443520
0.3397 9.0261 34200 0.3427 28609776
0.3408 9.0789 34400 0.3423 28777712
0.3417 9.1317 34600 0.3429 28944144
0.3468 9.1845 34800 0.3435 29111152
0.3391 9.2373 35000 0.3429 29278000
0.345 9.2901 35200 0.3431 29443792
0.3352 9.3429 35400 0.3430 29609072
0.3376 9.3957 35600 0.3433 29776592
0.3505 9.4485 35800 0.3428 29941616
0.3436 9.5013 36000 0.3432 30110160
0.3342 9.5540 36200 0.3423 30277744
0.3458 9.6068 36400 0.3433 30447152
0.3409 9.6596 36600 0.3429 30612976
0.3224 9.7124 36800 0.3430 30780240
0.3574 9.7652 37000 0.3430 30948048
0.3332 9.8180 37200 0.3429 31116368
0.3526 9.8708 37400 0.3425 31283888
0.3402 9.9236 37600 0.3431 31452560
0.3337 9.9764 37800 0.3428 31620720
0.3472 10.0290 38000 0.3428 31786016
0.3257 10.0818 38200 0.3429 31952768
0.3231 10.1346 38400 0.3432 32120320
0.3361 10.1874 38600 0.3431 32287584
0.3327 10.2402 38800 0.3429 32455072
0.3458 10.2930 39000 0.3429 32621184
0.3356 10.3458 39200 0.3430 32788960
0.3475 10.3986 39400 0.3431 32955776
0.3332 10.4514 39600 0.3431 33122816
0.3246 10.5042 39800 0.3432 33291072
0.3482 10.5569 40000 0.3433 33458560

Framework versions

  • PEFT 0.15.1
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_sst2_1744902626

Adapter
(431)
this model