train_wsc_1745950299

This model is a fine-tuned version of google/gemma-3-1b-it on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 4.9965
  • Num Input Tokens Seen: 14005200

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 40000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
5.8826 1.6024 200 5.5053 70208
5.0763 3.2008 400 5.3567 140304
4.7646 4.8032 600 5.3163 210336
5.7497 6.4016 800 5.3232 280224
5.7576 8.0 1000 5.2744 350448
5.3493 9.6024 1200 5.2395 420560
5.9306 11.2008 1400 5.2913 490880
5.5849 12.8032 1600 5.2287 560560
5.3923 14.4016 1800 5.2059 630816
5.1131 16.0 2000 5.1597 699936
4.9402 17.6024 2200 5.1741 769520
5.4474 19.2008 2400 5.1759 839648
4.8209 20.8032 2600 5.1446 910080
4.9124 22.4016 2800 5.1089 979504
5.2709 24.0 3000 5.1325 1049392
5.278 25.6024 3200 5.0762 1119904
4.916 27.2008 3400 5.1474 1189264
5.1115 28.8032 3600 5.1005 1259520
5.2598 30.4016 3800 5.0810 1329408
5.4014 32.0 4000 5.0811 1399696
5.419 33.6024 4200 5.0911 1470240
5.7328 35.2008 4400 5.0783 1539536
5.2734 36.8032 4600 5.0743 1610032
5.3228 38.4016 4800 5.0611 1680240
5.9158 40.0 5000 5.0856 1749472
5.3068 41.6024 5200 5.0227 1819376
5.1287 43.2008 5400 5.0778 1889616
5.2446 44.8032 5600 5.0547 1959536
5.2095 46.4016 5800 5.0481 2028864
5.2743 48.0 6000 5.0404 2099424
5.1529 49.6024 6200 5.0544 2169376
5.1871 51.2008 6400 5.0362 2239408
5.2363 52.8032 6600 5.0370 2309472
5.5796 54.4016 6800 5.0583 2380032
4.5613 56.0 7000 5.0546 2449376
5.5949 57.6024 7200 5.0837 2519776
5.4713 59.2008 7400 5.1097 2589392
5.0727 60.8032 7600 5.0747 2659792
4.7446 62.4016 7800 5.0783 2729184
5.3469 64.0 8000 5.0736 2799504
4.921 65.6024 8200 5.0933 2869520
5.0852 67.2008 8400 5.0411 2940080
4.6469 68.8032 8600 5.0502 3010256
5.218 70.4016 8800 5.0291 3080304
5.1953 72.0 9000 5.0702 3150464
4.5804 73.6024 9200 5.0236 3220512
4.8164 75.2008 9400 5.0161 3290320
5.5157 76.8032 9600 5.0176 3360352
5.0423 78.4016 9800 5.0560 3430416
4.7418 80.0 10000 5.0621 3500544
4.4244 81.6024 10200 5.0575 3570432
4.9467 83.2008 10400 5.0453 3640832
5.0881 84.8032 10600 5.0475 3710480
5.0995 86.4016 10800 5.0685 3780368
5.0999 88.0 11000 5.0329 3850720
5.4019 89.6024 11200 5.0374 3920848
5.0643 91.2008 11400 5.0753 3990784
5.2435 92.8032 11600 5.0708 4060432
5.0528 94.4016 11800 5.0673 4130528
5.5103 96.0 12000 5.0910 4200848
5.1448 97.6024 12200 5.1100 4270928
5.2059 99.2008 12400 5.1052 4339920
4.6471 100.8032 12600 5.1017 4410624
4.9262 102.4016 12800 5.0293 4479904
5.2129 104.0 13000 5.0363 4549824
5.0756 105.6024 13200 4.9999 4620128
4.8911 107.2008 13400 5.0197 4690352
5.4105 108.8032 13600 5.0017 4760256
4.6367 110.4016 13800 4.9981 4830144
4.9558 112.0 14000 5.0126 4900080
4.8652 113.6024 14200 4.9965 4969936
4.7695 115.2008 14400 5.0050 5040096
4.9551 116.8032 14600 5.0302 5110288
5.1785 118.4016 14800 5.0197 5180208
5.2527 120.0 15000 5.0144 5250464
5.2254 121.6024 15200 5.0178 5320528
5.5968 123.2008 15400 5.0225 5390624
5.219 124.8032 15600 5.0071 5460832
4.4181 126.4016 15800 5.0124 5530720
4.7678 128.0 16000 5.0128 5600992
4.8807 129.6024 16200 5.0184 5672032
4.771 131.2008 16400 5.0164 5740976
4.8087 132.8032 16600 5.0120 5811248
4.7813 134.4016 16800 5.0046 5881152
5.5101 136.0 17000 5.0140 5951136
4.8141 137.6024 17200 5.0294 6021136
5.2025 139.2008 17400 5.0068 6091696
4.9835 140.8032 17600 5.0054 6161472
4.9103 142.4016 17800 5.0068 6231760
5.8432 144.0 18000 5.0100 6301232
5.6101 145.6024 18200 5.0059 6371776
5.0518 147.2008 18400 5.0231 6442048
5.0497 148.8032 18600 5.0045 6511680
4.5987 150.4016 18800 5.0037 6581136
5.5221 152.0 19000 5.0084 6651296
5.1569 153.6024 19200 5.0084 6721584
5.0575 155.2008 19400 5.0120 6791744
5.2444 156.8032 19600 5.0055 6862112
4.7524 158.4016 19800 5.0055 6931856
4.8124 160.0 20000 5.0074 7001952
5.3737 161.6024 20200 5.0105 7071568
4.8858 163.2008 20400 5.0051 7141584
4.8946 164.8032 20600 5.0105 7212096
4.9381 166.4016 20800 5.0115 7282736
4.8341 168.0 21000 5.0151 7352288
5.3904 169.6024 21200 5.0080 7422624
5.2622 171.2008 21400 5.0105 7492496
5.0821 172.8032 21600 5.0128 7562288
5.4209 174.4016 21800 5.0128 7632432
4.7799 176.0 22000 5.0092 7702096
5.8407 177.6024 22200 5.0092 7772000
5.1688 179.2008 22400 5.0092 7842112
5.2247 180.8032 22600 5.0092 7912496
5.1015 182.4016 22800 5.0129 7982768
5.6092 184.0 23000 5.0129 8052448
5.5411 185.6024 23200 5.0129 8122832
4.979 187.2008 23400 5.0140 8193088
5.157 188.8032 23600 5.0140 8263104
5.009 190.4016 23800 5.0140 8333312
5.591 192.0 24000 5.0140 8402848
5.0195 193.6024 24200 5.0140 8472688
4.8046 195.2008 24400 5.0140 8542528
4.8943 196.8032 24600 5.0140 8612928
5.1195 198.4016 24800 5.0140 8682896
4.5993 200.0 25000 5.0140 8752864
4.9 201.6024 25200 5.0140 8823744
5.1337 203.2008 25400 5.0140 8893360
5.3839 204.8032 25600 5.0140 8963536
4.9969 206.4016 25800 5.0140 9033264
5.2706 208.0 26000 5.0140 9102880
5.072 209.6024 26200 5.0140 9173088
4.8892 211.2008 26400 5.0140 9242752
5.1248 212.8032 26600 5.0140 9313008
5.2002 214.4016 26800 5.0140 9382592
5.1155 216.0 27000 5.0140 9452912
4.5617 217.6024 27200 5.0140 9522896
5.0017 219.2008 27400 5.0140 9592864
5.0964 220.8032 27600 5.0140 9663568
5.1408 222.4016 27800 5.0140 9733504
5.1874 224.0 28000 5.0140 9803232
4.8597 225.6024 28200 5.0140 9872976
5.2342 227.2008 28400 5.0140 9943472
4.9542 228.8032 28600 5.0140 10013472
5.5457 230.4016 28800 5.0140 10082944
5.2678 232.0 29000 5.0140 10153120
5.4961 233.6024 29200 5.0140 10223856
5.5974 235.2008 29400 5.0140 10293888
5.3689 236.8032 29600 5.0140 10363824
5.0799 238.4016 29800 5.0140 10433056
5.4038 240.0 30000 5.0140 10503136
5.5451 241.6024 30200 5.0140 10573568
5.3873 243.2008 30400 5.0140 10642912
5.3173 244.8032 30600 5.0140 10713264
5.2546 246.4016 30800 5.0140 10783152
4.8004 248.0 31000 5.0140 10853376
5.2339 249.6024 31200 5.0140 10923696
5.2339 251.2008 31400 5.0140 10994016
5.6051 252.8032 31600 5.0140 11063664
5.3693 254.4016 31800 5.0140 11133840
5.1762 256.0 32000 5.0140 11203504
5.0229 257.6024 32200 5.0140 11273840
5.1271 259.2008 32400 5.0140 11342832
5.4677 260.8032 32600 5.0140 11412832
4.684 262.4016 32800 5.0140 11482880
4.684 264.0 33000 5.0140 11552512
5.0538 265.6024 33200 5.0140 11622560
5.1218 267.2008 33400 5.0140 11692336
5.2379 268.8032 33600 5.0140 11763296
5.1809 270.4016 33800 5.0140 11833168
5.3555 272.0 34000 5.0140 11902608
5.4007 273.6024 34200 5.0140 11973440
5.1665 275.2008 34400 5.0140 12042992
4.8605 276.8032 34600 5.0140 12113808
5.1055 278.4016 34800 5.0140 12183456
4.3887 280.0 35000 5.0140 12253312
5.1911 281.6024 35200 5.0140 12323712
4.8782 283.2008 35400 5.0140 12393344
5.0216 284.8032 35600 5.0140 12463296
5.3139 286.4016 35800 5.0140 12533712
5.0383 288.0 36000 5.0140 12603312
4.5486 289.6024 36200 5.0140 12672944
4.8665 291.2008 36400 5.0140 12743584
5.4847 292.8032 36600 5.0140 12814000
5.5078 294.4016 36800 5.0140 12883584
4.8833 296.0 37000 5.0140 12954144
5.3515 297.6024 37200 5.0140 13024112
4.9033 299.2008 37400 5.0140 13094448
5.0591 300.8032 37600 5.0140 13164640
5.5834 302.4016 37800 5.0140 13234048
5.2175 304.0 38000 5.0140 13304512
5.1956 305.6024 38200 5.0140 13374272
5.6496 307.2008 38400 5.0140 13444512
5.0242 308.8032 38600 5.0140 13514848
5.3893 310.4016 38800 5.0140 13584800
5.0775 312.0 39000 5.0140 13654928
4.9615 313.6024 39200 5.0140 13724752
4.8723 315.2008 39400 5.0140 13794224
5.1099 316.8032 39600 5.0140 13865104
5.2058 318.4016 39800 5.0140 13935776
5.5803 320.0 40000 5.0140 14005200

Framework versions

  • PEFT 0.15.2.dev0
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_1745950299

Adapter
(95)
this model