train_wic_1745950287

This model is a fine-tuned version of google/gemma-3-1b-it on the wic dataset. It achieves the following results on the evaluation set:

  • Loss: 3.5085
  • Num Input Tokens Seen: 13031928

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 40000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
3.574 0.1637 200 3.7502 65024
3.2951 0.3275 400 3.6551 129984
3.8231 0.4912 600 3.6054 195024
3.0376 0.6549 800 3.6036 260624
3.8106 0.8187 1000 3.6071 325984
3.9018 0.9824 1200 3.6313 391280
3.3753 1.1457 1400 3.6102 456248
3.3904 1.3095 1600 3.6261 521464
3.3409 1.4732 1800 3.5766 586632
3.1271 1.6369 2000 3.6092 651384
3.4137 1.8007 2200 3.6021 716552
3.557 1.9644 2400 3.5911 781992
3.3821 2.1277 2600 3.5622 847136
3.0418 2.2914 2800 3.5362 912064
3.2648 2.4552 3000 3.5774 977312
2.9868 2.6189 3200 3.5574 1042608
3.8563 2.7826 3400 3.5400 1107488
3.0897 2.9464 3600 3.5933 1172864
4.0118 3.1097 3800 3.5637 1238392
3.8601 3.2734 4000 3.5533 1303640
2.942 3.4372 4200 3.5551 1368504
3.9586 3.6009 4400 3.5797 1433480
3.6904 3.7646 4600 3.5560 1499016
3.5232 3.9284 4800 3.5318 1563880
3.5762 4.0917 5000 3.5259 1628808
3.1249 4.2554 5200 3.5340 1693576
2.9885 4.4192 5400 3.5437 1758536
3.265 4.5829 5600 3.5389 1823544
3.8197 4.7466 5800 3.5350 1889272
2.9118 4.9104 6000 3.5583 1954632
4.2121 5.0737 6200 3.5233 2019440
4.0256 5.2374 6400 3.5461 2084816
3.1994 5.4011 6600 3.5247 2149632
3.8408 5.5649 6800 3.5577 2214864
3.1499 5.7286 7000 3.5142 2280368
3.8928 5.8923 7200 3.5252 2345632
3.2538 6.0557 7400 3.5120 2410768
3.1951 6.2194 7600 3.5508 2476096
3.7805 6.3831 7800 3.5411 2541152
3.731 6.5469 8000 3.5363 2606016
3.4264 6.7106 8200 3.5124 2670896
3.7905 6.8743 8400 3.5184 2736160
3.7571 7.0377 8600 3.5337 2801120
3.5284 7.2014 8800 3.5325 2865872
3.2259 7.3651 9000 3.5462 2931072
3.5172 7.5289 9200 3.5450 2996288
3.8408 7.6926 9400 3.5358 3061744
3.7686 7.8563 9600 3.5335 3126896
4.3737 8.0196 9800 3.5273 3191832
3.0142 8.1834 10000 3.5339 3257640
3.2021 8.3471 10200 3.5421 3322584
3.0833 8.5108 10400 3.5248 3387672
3.1155 8.6746 10600 3.5478 3452968
3.4809 8.8383 10800 3.5213 3518104
3.4551 9.0016 11000 3.5289 3583216
3.7315 9.1654 11200 3.5179 3648592
3.2648 9.3291 11400 3.5085 3713808
3.3495 9.4928 11600 3.5319 3778848
4.1403 9.6566 11800 3.5392 3844208
3.3022 9.8203 12000 3.5549 3909264
3.0986 9.9840 12200 3.5357 3974224
3.4615 10.1474 12400 3.5353 4039488
2.9809 10.3111 12600 3.5419 4104512
3.3136 10.4748 12800 3.5455 4169856
4.3368 10.6386 13000 3.5489 4234864
3.4102 10.8023 13200 3.5388 4300144
3.4905 10.9660 13400 3.5782 4365440
3.4071 11.1293 13600 3.5616 4430440
3.1702 11.2931 13800 3.5541 4495784
3.0114 11.4568 14000 3.5549 4560792
3.7312 11.6205 14200 3.5694 4625720
3.6726 11.7843 14400 3.5617 4690744
3.7964 11.9480 14600 3.5560 4756152
4.1136 12.1113 14800 3.5506 4821256
3.3772 12.2751 15000 3.5625 4886344
3.7939 12.4388 15200 3.5634 4951960
3.2353 12.6025 15400 3.5322 5016856
3.1561 12.7663 15600 3.5188 5082248
4.3469 12.9300 15800 3.5596 5147240
3.4963 13.0933 16000 3.5604 5212440
4.1762 13.2571 16200 3.5374 5277800
3.6552 13.4208 16400 3.5589 5342760
3.7178 13.5845 16600 3.5499 5407816
3.3828 13.7483 16800 3.5545 5473672
3.4852 13.9120 17000 3.5650 5538456
3.8798 14.0753 17200 3.5606 5603152
3.1482 14.2391 17400 3.5547 5668048
4.0152 14.4028 17600 3.5595 5732816
4.028 14.5665 17800 3.5676 5798240
4.3121 14.7302 18000 3.5581 5863936
3.5476 14.8940 18200 3.5613 5929216
3.4462 15.0573 18400 3.5455 5994376
3.9775 15.2210 18600 3.5663 6059464
3.3053 15.3848 18800 3.5472 6125240
3.4585 15.5485 19000 3.5601 6190600
3.2784 15.7122 19200 3.5268 6255240
3.101 15.8760 19400 3.5718 6320328
3.7695 16.0393 19600 3.5722 6385240
3.1582 16.2030 19800 3.5270 6450424
4.2664 16.3668 20000 3.5577 6515688
3.0316 16.5305 20200 3.5432 6580712
3.0393 16.6942 20400 3.5413 6646184
3.4592 16.8580 20600 3.5338 6711480
2.918 17.0213 20800 3.5637 6776176
3.9127 17.1850 21000 3.5319 6841120
3.2172 17.3488 21200 3.5316 6906528
3.9372 17.5125 21400 3.5263 6971568
3.3571 17.6762 21600 3.5427 7036832
4.0576 17.8400 21800 3.5578 7102176
3.7445 18.0033 22000 3.5642 7167168
3.5854 18.1670 22200 3.5294 7232736
4.1088 18.3307 22400 3.5527 7297984
3.279 18.4945 22600 3.5682 7362832
2.7462 18.6582 22800 3.5595 7428672
3.8284 18.8219 23000 3.5334 7493504
3.5873 18.9857 23200 3.5301 7558400
3.8182 19.1490 23400 3.5588 7623392
3.7164 19.3127 23600 3.5315 7688624
4.0308 19.4765 23800 3.5314 7753632
3.4377 19.6402 24000 3.5320 7819136
2.9969 19.8039 24200 3.5329 7884272
3.2624 19.9677 24400 3.5258 7949504
3.6723 20.1310 24600 3.5279 8014544
3.8711 20.2947 24800 3.5275 8079920
3.5965 20.4585 25000 3.5295 8145552
3.3086 20.6222 25200 3.5285 8210688
3.6435 20.7859 25400 3.5280 8275760
3.7272 20.9497 25600 3.5323 8340784
4.5898 21.1130 25800 3.5293 8405688
2.9163 21.2767 26000 3.5255 8470664
3.2458 21.4404 26200 3.5270 8535736
3.2172 21.6042 26400 3.5262 8600728
3.9145 21.7679 26600 3.5281 8666296
3.3908 21.9316 26800 3.5281 8731640
3.5334 22.0950 27000 3.5305 8796704
3.8381 22.2587 27200 3.5314 8861792
3.7604 22.4224 27400 3.5314 8927168
3.368 22.5862 27600 3.5317 8992240
2.8802 22.7499 27800 3.5317 9057600
3.6336 22.9136 28000 3.5310 9122992
3.1203 23.0770 28200 3.5304 9187992
3.2907 23.2407 28400 3.5304 9253112
3.8521 23.4044 28600 3.5307 9318440
3.1546 23.5682 28800 3.5246 9383656
3.1028 23.7319 29000 3.5246 9448616
3.5328 23.8956 29200 3.5246 9513976
2.9873 24.0589 29400 3.5257 9579416
3.7107 24.2227 29600 3.5257 9644664
3.4592 24.3864 29800 3.5257 9710056
3.494 24.5501 30000 3.5246 9775272
2.5752 24.7139 30200 3.5246 9840600
3.1539 24.8776 30400 3.5244 9905368
3.1573 25.0409 30600 3.5246 9970160
3.3874 25.2047 30800 3.5246 10035200
3.3958 25.3684 31000 3.5246 10100368
3.9979 25.5321 31200 3.5246 10165552
3.5085 25.6959 31400 3.5246 10230992
4.074 25.8596 31600 3.5246 10295840
3.5245 26.0229 31800 3.5246 10360952
3.5162 26.1867 32000 3.5246 10425832
3.4838 26.3504 32200 3.5246 10490904
4.0574 26.5141 32400 3.5246 10556056
3.6515 26.6779 32600 3.5246 10621432
3.4461 26.8416 32800 3.5246 10686808
3.4493 27.0049 33000 3.5246 10751912
4.016 27.1686 33200 3.5246 10817272
3.758 27.3324 33400 3.5246 10882568
3.3578 27.4961 33600 3.5246 10947368
4.1326 27.6598 33800 3.5246 11012568
3.415 27.8236 34000 3.5246 11078056
3.3055 27.9873 34200 3.5246 11143272
3.8248 28.1506 34400 3.5246 11208128
3.1712 28.3144 34600 3.5246 11273344
3.398 28.4781 34800 3.5246 11338704
3.5494 28.6418 35000 3.5246 11404240
3.4769 28.8056 35200 3.5246 11469056
3.182 28.9693 35400 3.5246 11534288
3.174 29.1326 35600 3.5246 11599248
3.5172 29.2964 35800 3.5246 11664528
3.3028 29.4601 36000 3.5246 11729904
3.6835 29.6238 36200 3.5246 11794928
3.5263 29.7876 36400 3.5246 11860400
3.4762 29.9513 36600 3.5246 11925328
3.2393 30.1146 36800 3.5246 11989944
2.7341 30.2783 37000 3.5246 12054968
3.8151 30.4421 37200 3.5246 12120184
3.5903 30.6058 37400 3.5246 12185832
3.9725 30.7695 37600 3.5246 12250664
3.362 30.9333 37800 3.5246 12315704
3.8829 31.0966 38000 3.5246 12380824
3.8009 31.2603 38200 3.5246 12446424
4.0143 31.4241 38400 3.5246 12511800
3.6845 31.5878 38600 3.5246 12576920
2.9847 31.7515 38800 3.5246 12641896
3.4192 31.9153 39000 3.5246 12706504
4.0683 32.0786 39200 3.5246 12771208
3.2408 32.2423 39400 3.5246 12836760
3.2512 32.4061 39600 3.5246 12901944
3.2646 32.5698 39800 3.5246 12967000
3.9378 32.7335 40000 3.5246 13031928

Framework versions

  • PEFT 0.15.2.dev0
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wic_1745950287

Adapter
(95)
this model