train_wsc_1745950298

This model is a fine-tuned version of google/gemma-3-1b-it on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2398
  • Num Input Tokens Seen: 14005200

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 40000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.2502 1.6024 200 0.2398 70208
0.2243 3.2008 400 0.2570 140304
0.2314 4.8032 600 0.2445 210336
0.2246 6.4016 800 0.2456 280224
0.2238 8.0 1000 0.2563 350448
0.2056 9.6024 1200 0.3039 420560
0.218 11.2008 1400 0.3033 490880
0.2243 12.8032 1600 0.2909 560560
0.228 14.4016 1800 0.2976 630816
0.2312 16.0 2000 0.3352 699936
0.256 17.6024 2200 0.3305 769520
0.1819 19.2008 2400 0.5937 839648
0.158 20.8032 2600 0.7600 910080
0.1106 22.4016 2800 1.2361 979504
0.1991 24.0 3000 1.0813 1049392
0.1846 25.6024 3200 1.5614 1119904
0.1735 27.2008 3400 2.3810 1189264
0.1509 28.8032 3600 2.0245 1259520
0.0021 30.4016 3800 3.0666 1329408
0.0929 32.0 4000 3.0413 1399696
0.0981 33.6024 4200 3.5872 1470240
0.0002 35.2008 4400 3.5883 1539536
0.0102 36.8032 4600 3.9757 1610032
0.3213 38.4016 4800 4.2087 1680240
0.0963 40.0 5000 4.1447 1749472
0.0002 41.6024 5200 4.0717 1819376
0.0 43.2008 5400 4.1688 1889616
0.0 44.8032 5600 4.2851 1959536
0.0 46.4016 5800 4.2626 2028864
0.0002 48.0 6000 3.9931 2099424
0.0 49.6024 6200 4.0036 2169376
0.0 51.2008 6400 4.0874 2239408
0.0 52.8032 6600 4.1775 2309472
0.0 54.4016 6800 4.4232 2380032
0.0 56.0 7000 4.3323 2449376
0.1357 57.6024 7200 2.3013 2519776
0.0004 59.2008 7400 3.9364 2589392
0.0 60.8032 7600 4.5112 2659792
0.0002 62.4016 7800 4.4699 2729184
0.0 64.0 8000 4.7731 2799504
0.0 65.6024 8200 4.6935 2869520
0.0002 67.2008 8400 4.7713 2940080
0.0 68.8032 8600 4.9666 3010256
0.0 70.4016 8800 5.0120 3080304
0.0 72.0 9000 5.0390 3150464
0.0 73.6024 9200 5.0681 3220512
0.0 75.2008 9400 5.0208 3290320
0.0 76.8032 9600 5.0913 3360352
0.0 78.4016 9800 5.1181 3430416
0.0 80.0 10000 5.1148 3500544
0.0 81.6024 10200 5.1373 3570432
0.0 83.2008 10400 5.1854 3640832
0.0 84.8032 10600 5.1791 3710480
0.0 86.4016 10800 5.1904 3780368
0.0 88.0 11000 5.2121 3850720
0.0 89.6024 11200 5.2214 3920848
0.0 91.2008 11400 5.1889 3990784
0.0 92.8032 11600 5.2617 4060432
0.0 94.4016 11800 5.2567 4130528
0.0 96.0 12000 5.3243 4200848
0.0 97.6024 12200 5.3238 4270928
0.0 99.2008 12400 5.3268 4339920
0.0 100.8032 12600 5.3216 4410624
0.0 102.4016 12800 5.3369 4479904
0.0 104.0 13000 5.3556 4549824
0.0 105.6024 13200 5.3621 4620128
0.0 107.2008 13400 5.4462 4690352
0.0 108.8032 13600 5.4229 4760256
0.0 110.4016 13800 5.3623 4830144
0.0 112.0 14000 5.4414 4900080
0.0 113.6024 14200 5.4651 4969936
0.0 115.2008 14400 5.4911 5040096
0.0 116.8032 14600 5.4978 5110288
0.0 118.4016 14800 5.5403 5180208
0.0 120.0 15000 5.5455 5250464
0.0 121.6024 15200 5.5610 5320528
0.0 123.2008 15400 5.5894 5390624
0.0 124.8032 15600 5.6072 5460832
0.0 126.4016 15800 5.6240 5530720
0.0 128.0 16000 5.6497 5600992
0.0 129.6024 16200 5.6333 5672032
0.0 131.2008 16400 5.6614 5740976
0.0 132.8032 16600 5.6828 5811248
0.0 134.4016 16800 5.6995 5881152
0.0 136.0 17000 5.7738 5951136
0.0 137.6024 17200 5.7470 6021136
0.0 139.2008 17400 5.7591 6091696
0.0 140.8032 17600 5.7855 6161472
0.0 142.4016 17800 5.8064 6231760
0.0 144.0 18000 5.8327 6301232
0.0 145.6024 18200 5.8848 6371776
0.0 147.2008 18400 5.8775 6442048
0.0 148.8032 18600 5.9053 6511680
0.0 150.4016 18800 5.9010 6581136
0.0 152.0 19000 5.9301 6651296
0.0 153.6024 19200 5.9435 6721584
0.0 155.2008 19400 5.9803 6791744
0.0 156.8032 19600 6.0182 6862112
0.0 158.4016 19800 6.0037 6931856
0.0 160.0 20000 6.0110 7001952
0.0 161.6024 20200 5.9660 7071568
0.0 163.2008 20400 6.0137 7141584
0.0 164.8032 20600 6.0390 7212096
0.0 166.4016 20800 6.0555 7282736
0.0 168.0 21000 6.0948 7352288
0.0 169.6024 21200 6.1164 7422624
0.0 171.2008 21400 6.1387 7492496
0.0 172.8032 21600 6.1157 7562288
0.0 174.4016 21800 6.1460 7632432
0.0 176.0 22000 6.1857 7702096
0.0 177.6024 22200 6.1444 7772000
0.0 179.2008 22400 6.1881 7842112
0.0 180.8032 22600 6.2875 7912496
0.0 182.4016 22800 6.2525 7982768
0.0 184.0 23000 6.2246 8052448
0.0 185.6024 23200 6.2503 8122832
0.0 187.2008 23400 6.2291 8193088
0.0 188.8032 23600 6.2625 8263104
0.0 190.4016 23800 6.2605 8333312
0.0 192.0 24000 6.2397 8402848
0.0 193.6024 24200 6.2157 8472688
0.0 195.2008 24400 6.2733 8542528
0.0 196.8032 24600 6.3027 8612928
0.0 198.4016 24800 6.2369 8682896
0.0 200.0 25000 6.3063 8752864
0.0 201.6024 25200 6.2636 8823744
0.0 203.2008 25400 6.2100 8893360
0.0 204.8032 25600 6.2911 8963536
0.0 206.4016 25800 6.2168 9033264
0.0 208.0 26000 6.2600 9102880
0.0 209.6024 26200 6.2668 9173088
0.0 211.2008 26400 6.2681 9242752
0.0 212.8032 26600 6.2854 9313008
0.0 214.4016 26800 6.2501 9382592
0.0 216.0 27000 6.2807 9452912
0.0 217.6024 27200 6.2134 9522896
0.0 219.2008 27400 6.3790 9592864
0.0 220.8032 27600 6.3640 9663568
0.0 222.4016 27800 6.3814 9733504
0.0 224.0 28000 6.3391 9803232
0.0 225.6024 28200 6.4282 9872976
0.0 227.2008 28400 6.4834 9943472
0.0 228.8032 28600 6.5947 10013472
0.0 230.4016 28800 6.5284 10082944
0.0 232.0 29000 6.6673 10153120
0.0 233.6024 29200 6.6531 10223856
0.0 235.2008 29400 6.7943 10293888
0.0 236.8032 29600 6.8080 10363824
0.0 238.4016 29800 6.8269 10433056
0.0 240.0 30000 6.7854 10503136
0.0 241.6024 30200 6.9273 10573568
0.0 243.2008 30400 6.8975 10642912
0.0 244.8032 30600 6.9270 10713264
0.0 246.4016 30800 6.9037 10783152
0.0 248.0 31000 6.9580 10853376
0.0 249.6024 31200 6.8934 10923696
0.0 251.2008 31400 6.9023 10994016
0.0 252.8032 31600 6.8389 11063664
0.0 254.4016 31800 6.7591 11133840
0.0 256.0 32000 6.7549 11203504
0.0 257.6024 32200 6.8300 11273840
0.0 259.2008 32400 6.7702 11342832
0.0 260.8032 32600 6.7095 11412832
0.0 262.4016 32800 6.7570 11482880
0.0 264.0 33000 6.7268 11552512
0.0 265.6024 33200 6.6205 11622560
0.0 267.2008 33400 6.5914 11692336
0.0 268.8032 33600 6.6435 11763296
0.0 270.4016 33800 6.6254 11833168
0.0 272.0 34000 6.5398 11902608
0.0 273.6024 34200 6.4623 11973440
0.0 275.2008 34400 6.5638 12042992
0.0 276.8032 34600 6.5642 12113808
0.0 278.4016 34800 6.5720 12183456
0.0 280.0 35000 6.5277 12253312
0.0 281.6024 35200 6.5080 12323712
0.0 283.2008 35400 6.4282 12393344
0.0 284.8032 35600 6.5433 12463296
0.0 286.4016 35800 6.5506 12533712
0.0 288.0 36000 6.4980 12603312
0.0 289.6024 36200 6.4744 12672944
0.0 291.2008 36400 6.4789 12743584
0.0 292.8032 36600 6.5051 12814000
0.0 294.4016 36800 6.5353 12883584
0.0 296.0 37000 6.4756 12954144
0.0 297.6024 37200 6.5368 13024112
0.0 299.2008 37400 6.5682 13094448
0.0 300.8032 37600 6.5119 13164640
0.0 302.4016 37800 6.4694 13234048
0.0 304.0 38000 6.5104 13304512
0.0 305.6024 38200 6.5197 13374272
0.0 307.2008 38400 6.4882 13444512
0.0 308.8032 38600 6.5518 13514848
0.0 310.4016 38800 6.4864 13584800
0.0 312.0 39000 6.5067 13654928
0.0 313.6024 39200 6.4883 13724752
0.0 315.2008 39400 6.5242 13794224
0.0 316.8032 39600 6.5555 13865104
0.0 318.4016 39800 6.5335 13935776
0.0 320.0 40000 6.5357 14005200

Framework versions

  • PEFT 0.15.2.dev0
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_1745950298

Adapter
(95)
this model