impossible-llms-spanish-fronting-bigram

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 7.6691

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
51.7865 1.0 8 10.1102
47.2611 2.0 16 9.3475
45.357 3.0 24 9.0172
44.5608 4.0 32 8.8784
43.8003 5.0 40 8.7243
42.8317 6.0 48 8.5317
41.8538 7.0 56 8.3205
40.5846 8.0 64 8.1027
39.4108 9.0 72 7.8955
38.588 10.0 80 7.6823
37.5276 11.0 88 7.4599
36.4055 12.0 96 7.2387
35.2071 13.0 104 7.0278
34.3234 14.0 112 6.8368
33.4358 15.0 120 6.6640
32.7482 16.0 128 6.5664
32.3436 17.0 136 6.4462
31.6928 18.0 144 6.3745
31.5663 19.0 152 6.2989
31.202 20.0 160 6.2500
30.9459 21.0 168 6.2002
30.8036 22.0 176 6.1805
30.4731 23.0 184 6.1291
30.4 24.0 192 6.1024
30.2214 25.0 200 6.0663
30.0188 26.0 208 6.0528
30.0152 27.0 216 6.0160
29.5878 28.0 224 5.9904
29.6721 29.0 232 5.9685
29.4058 30.0 240 5.9469
29.5073 31.0 248 5.9272
29.5181 32.0 256 5.9112
29.2749 33.0 264 5.8886
28.9981 34.0 272 5.8692
28.961 35.0 280 5.8535
28.9255 36.0 288 5.8369
28.6316 37.0 296 5.8209
28.5788 38.0 304 5.8124
28.4946 39.0 312 5.7925
28.2078 40.0 320 5.7885
28.0854 41.0 328 5.7633
28.1322 42.0 336 5.7494
27.9595 43.0 344 5.7403
27.8058 44.0 352 5.7274
27.6942 45.0 360 5.7110
27.5208 46.0 368 5.6964
27.3724 47.0 376 5.6864
27.3083 48.0 384 5.6803
27.1867 49.0 392 5.6681
27.2074 50.0 400 5.6634
27.0896 51.0 408 5.6550
26.8508 52.0 416 5.6404
26.7945 53.0 424 5.6373
26.6411 54.0 432 5.6309
26.5528 55.0 440 5.6305
26.4657 56.0 448 5.6186
26.1433 57.0 456 5.6250
26.2097 58.0 464 5.6179
26.0528 59.0 472 5.6161
26.0273 60.0 480 5.6151
25.7964 61.0 488 5.6193
25.7268 62.0 496 5.6126
25.7201 63.0 504 5.6092
25.3521 64.0 512 5.6123
25.4724 65.0 520 5.6182
25.3514 66.0 528 5.6154
25.1205 67.0 536 5.6159
25.0164 68.0 544 5.6217
24.9465 69.0 552 5.6284
24.9826 70.0 560 5.6356
24.8623 71.0 568 5.6378
24.623 72.0 576 5.6390
24.6527 73.0 584 5.6474
24.5611 74.0 592 5.6584
24.2702 75.0 600 5.6654
24.1436 76.0 608 5.6756
24.0619 77.0 616 5.6812
23.9576 78.0 624 5.6847
23.7242 79.0 632 5.6943
23.636 80.0 640 5.7032
23.6262 81.0 648 5.7136
23.5536 82.0 656 5.7244
23.2478 83.0 664 5.7401
23.0543 84.0 672 5.7457
23.1703 85.0 680 5.7572
22.9263 86.0 688 5.7806
22.82 87.0 696 5.7796
22.8149 88.0 704 5.7937
22.6482 89.0 712 5.8184
22.4487 90.0 720 5.8230
22.4615 91.0 728 5.8316
22.4335 92.0 736 5.8445
22.0866 93.0 744 5.8531
22.2906 94.0 752 5.8705
22.1485 95.0 760 5.8856
22.006 96.0 768 5.8992
21.6728 97.0 776 5.9094
21.6655 98.0 784 5.9253
21.4957 99.0 792 5.9453
21.4518 100.0 800 5.9597
21.2738 101.0 808 5.9564
21.1115 102.0 816 5.9868
20.9962 103.0 824 5.9946
20.9172 104.0 832 5.9991
21.021 105.0 840 6.0163
20.7555 106.0 848 6.0351
20.6887 107.0 856 6.0554
20.6573 108.0 864 6.0588
20.4604 109.0 872 6.0805
20.2688 110.0 880 6.0841
20.2978 111.0 888 6.1073
20.1676 112.0 896 6.1259
20.0047 113.0 904 6.1382
19.987 114.0 912 6.1559
19.7857 115.0 920 6.1672
19.6289 116.0 928 6.1724
19.6443 117.0 936 6.1895
19.5238 118.0 944 6.2129
19.3597 119.0 952 6.2281
19.415 120.0 960 6.2390
19.1758 121.0 968 6.2542
19.1887 122.0 976 6.2636
19.0214 123.0 984 6.2687
18.8736 124.0 992 6.2880
18.8627 125.0 1000 6.3139
18.649 126.0 1008 6.3215
18.7256 127.0 1016 6.3317
18.5088 128.0 1024 6.3568
18.3701 129.0 1032 6.3672
18.455 130.0 1040 6.3873
18.3206 131.0 1048 6.3910
18.0829 132.0 1056 6.4127
18.2003 133.0 1064 6.4212
17.9939 134.0 1072 6.4374
17.8924 135.0 1080 6.4498
17.8084 136.0 1088 6.4619
17.7389 137.0 1096 6.4699
17.7203 138.0 1104 6.4904
17.4608 139.0 1112 6.5023
17.3992 140.0 1120 6.5115
17.449 141.0 1128 6.5301
17.2316 142.0 1136 6.5389
17.2179 143.0 1144 6.5503
17.1479 144.0 1152 6.5604
17.0588 145.0 1160 6.5753
17.02 146.0 1168 6.5853
16.948 147.0 1176 6.6078
16.8155 148.0 1184 6.6165
16.816 149.0 1192 6.6318
16.6817 150.0 1200 6.6428
16.6077 151.0 1208 6.6446
16.4943 152.0 1216 6.6656
16.4805 153.0 1224 6.6699
16.3307 154.0 1232 6.6889
16.2998 155.0 1240 6.7088
16.216 156.0 1248 6.7101
16.1999 157.0 1256 6.7263
16.1607 158.0 1264 6.7324
16.0138 159.0 1272 6.7500
15.9838 160.0 1280 6.7589
15.8358 161.0 1288 6.7738
15.7344 162.0 1296 6.7835
15.7017 163.0 1304 6.7875
15.7405 164.0 1312 6.8034
15.6348 165.0 1320 6.8117
15.459 166.0 1328 6.8194
15.455 167.0 1336 6.8286
15.4361 168.0 1344 6.8453
15.3021 169.0 1352 6.8571
15.2873 170.0 1360 6.8656
15.1446 171.0 1368 6.8757
15.1495 172.0 1376 6.8746
15.1415 173.0 1384 6.8952
15.0827 174.0 1392 6.8990
14.9722 175.0 1400 6.9115
14.8555 176.0 1408 6.9335
14.9168 177.0 1416 6.9373
14.8235 178.0 1424 6.9553
14.7121 179.0 1432 6.9536
14.7223 180.0 1440 6.9653
14.6185 181.0 1448 6.9858
14.5313 182.0 1456 6.9843
14.4481 183.0 1464 7.0015
14.4247 184.0 1472 7.0101
14.4043 185.0 1480 7.0139
14.1844 186.0 1488 7.0221
14.2311 187.0 1496 7.0356
14.1971 188.0 1504 7.0413
14.1968 189.0 1512 7.0568
14.161 190.0 1520 7.0636
14.0771 191.0 1528 7.0697
14.0347 192.0 1536 7.0855
13.9711 193.0 1544 7.0787
13.9664 194.0 1552 7.0935
13.7762 195.0 1560 7.1017
13.8125 196.0 1568 7.1168
13.7696 197.0 1576 7.1160
13.7972 198.0 1584 7.1348
13.6743 199.0 1592 7.1384
13.5608 200.0 1600 7.1393
13.5463 201.0 1608 7.1606
13.5895 202.0 1616 7.1575
13.4692 203.0 1624 7.1588
13.4569 204.0 1632 7.1826
13.4296 205.0 1640 7.1860
13.4267 206.0 1648 7.1967
13.279 207.0 1656 7.2000
13.2424 208.0 1664 7.2013
13.2571 209.0 1672 7.2114
13.1909 210.0 1680 7.2176
13.165 211.0 1688 7.2361
13.0753 212.0 1696 7.2373
13.0506 213.0 1704 7.2349
13.061 214.0 1712 7.2490
12.9729 215.0 1720 7.2604
12.8804 216.0 1728 7.2651
12.8953 217.0 1736 7.2707
12.8795 218.0 1744 7.2770
12.859 219.0 1752 7.2860
12.804 220.0 1760 7.2816
12.8023 221.0 1768 7.2949
12.7356 222.0 1776 7.3053
12.6938 223.0 1784 7.3143
12.7365 224.0 1792 7.3196
12.6243 225.0 1800 7.3240
12.533 226.0 1808 7.3305
12.5463 227.0 1816 7.3293
12.5938 228.0 1824 7.3394
12.5622 229.0 1832 7.3468
12.5007 230.0 1840 7.3586
12.3205 231.0 1848 7.3596
12.4015 232.0 1856 7.3618
12.3483 233.0 1864 7.3741
12.2927 234.0 1872 7.3764
12.2057 235.0 1880 7.3808
12.3218 236.0 1888 7.3861
12.2043 237.0 1896 7.3882
12.1129 238.0 1904 7.4035
12.1093 239.0 1912 7.4038
12.103 240.0 1920 7.4082
12.1098 241.0 1928 7.4157
12.1093 242.0 1936 7.4137
12.0219 243.0 1944 7.4170
12.065 244.0 1952 7.4305
12.0445 245.0 1960 7.4326
11.9507 246.0 1968 7.4349
12.0039 247.0 1976 7.4433
11.9653 248.0 1984 7.4483
11.9137 249.0 1992 7.4565
11.9049 250.0 2000 7.4586
11.8611 251.0 2008 7.4617
11.8262 252.0 2016 7.4610
11.7463 253.0 2024 7.4627
11.7786 254.0 2032 7.4662
11.7193 255.0 2040 7.4807
11.7704 256.0 2048 7.4819
11.6749 257.0 2056 7.4884
11.6517 258.0 2064 7.4897
11.6551 259.0 2072 7.4966
11.629 260.0 2080 7.4995
11.6125 261.0 2088 7.5013
11.634 262.0 2096 7.5066
11.544 263.0 2104 7.5116
11.5611 264.0 2112 7.5148
11.5625 265.0 2120 7.5163
11.4926 266.0 2128 7.5227
11.4875 267.0 2136 7.5220
11.4771 268.0 2144 7.5214
11.4353 269.0 2152 7.5298
11.4137 270.0 2160 7.5368
11.3471 271.0 2168 7.5347
11.4228 272.0 2176 7.5469
11.3343 273.0 2184 7.5428
11.3997 274.0 2192 7.5504
11.3038 275.0 2200 7.5516
11.3 276.0 2208 7.5554
11.2164 277.0 2216 7.5531
11.2416 278.0 2224 7.5563
11.2725 279.0 2232 7.5633
11.2514 280.0 2240 7.5727
11.2325 281.0 2248 7.5684
11.1597 282.0 2256 7.5722
11.1741 283.0 2264 7.5684
11.1548 284.0 2272 7.5771
11.1209 285.0 2280 7.5747
11.2112 286.0 2288 7.5805
11.0399 287.0 2296 7.5861
11.1127 288.0 2304 7.5815
11.0834 289.0 2312 7.5894
11.15 290.0 2320 7.5891
11.0684 291.0 2328 7.5959
11.0543 292.0 2336 7.5972
11.072 293.0 2344 7.6032
10.9922 294.0 2352 7.5986
11.0509 295.0 2360 7.6043
10.9953 296.0 2368 7.6054
11.0534 297.0 2376 7.6097
10.9692 298.0 2384 7.6076
11.0021 299.0 2392 7.6094
10.9839 300.0 2400 7.6158
11.0022 301.0 2408 7.6157
10.9704 302.0 2416 7.6166
10.9331 303.0 2424 7.6185
10.9065 304.0 2432 7.6253
10.9971 305.0 2440 7.6266
10.9224 306.0 2448 7.6242
10.9332 307.0 2456 7.6250
10.8779 308.0 2464 7.6276
10.9014 309.0 2472 7.6266
10.9626 310.0 2480 7.6298
10.8851 311.0 2488 7.6332
10.852 312.0 2496 7.6328
10.8442 313.0 2504 7.6362
10.8102 314.0 2512 7.6375
10.8672 315.0 2520 7.6353
10.8428 316.0 2528 7.6398
10.8679 317.0 2536 7.6415
10.8335 318.0 2544 7.6428
10.7906 319.0 2552 7.6477
10.7862 320.0 2560 7.6464
10.7698 321.0 2568 7.6465
10.8097 322.0 2576 7.6473
10.7873 323.0 2584 7.6485
10.7952 324.0 2592 7.6493
10.7948 325.0 2600 7.6496
10.7771 326.0 2608 7.6484
10.7632 327.0 2616 7.6516
10.7615 328.0 2624 7.6518
10.7077 329.0 2632 7.6549
10.8326 330.0 2640 7.6559
10.7826 331.0 2648 7.6583
10.7929 332.0 2656 7.6576
10.7658 333.0 2664 7.6575
10.7309 334.0 2672 7.6600
10.6781 335.0 2680 7.6601
10.708 336.0 2688 7.6607
10.7652 337.0 2696 7.6635
10.7291 338.0 2704 7.6599
10.7369 339.0 2712 7.6621
10.6781 340.0 2720 7.6630
10.7016 341.0 2728 7.6603
10.6927 342.0 2736 7.6639
10.677 343.0 2744 7.6630
10.7185 344.0 2752 7.6664
10.7302 345.0 2760 7.6640
10.6203 346.0 2768 7.6669
10.6795 347.0 2776 7.6659
10.6393 348.0 2784 7.6649
10.6661 349.0 2792 7.6653
10.6678 350.0 2800 7.6657
10.6722 351.0 2808 7.6663
10.682 352.0 2816 7.6673
10.6311 353.0 2824 7.6666
10.6867 354.0 2832 7.6680
10.6871 355.0 2840 7.6688
10.6724 356.0 2848 7.6683
10.6331 357.0 2856 7.6679
10.6949 358.0 2864 7.6674
10.5949 359.0 2872 7.6675
10.6906 360.0 2880 7.6682
10.6349 361.0 2888 7.6682
10.5854 362.0 2896 7.6684
10.678 363.0 2904 7.6685
10.6793 364.0 2912 7.6687
10.6762 365.0 2920 7.6690
10.729 366.0 2928 7.6691
10.6483 367.0 2936 7.6689
10.6422 368.0 2944 7.6690
10.6841 369.0 2952 7.6690
10.6743 370.0 2960 7.6690
10.6153 371.0 2968 7.6691
10.678 372.0 2976 7.6691
10.6384 373.0 2984 7.6691
10.6086 374.0 2992 7.6691
10.6207 375.0 3000 7.6691

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
2
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support