impossible-llms-spanish-fronting-bigram
This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 7.6691
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 12
- eval_batch_size: 8
- seed: 0
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 8
- total_train_batch_size: 384
- total_eval_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- training_steps: 3000
- mixed_precision_training: Native AMP
- label_smoothing_factor: 0.1
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
51.7865 | 1.0 | 8 | 10.1102 |
47.2611 | 2.0 | 16 | 9.3475 |
45.357 | 3.0 | 24 | 9.0172 |
44.5608 | 4.0 | 32 | 8.8784 |
43.8003 | 5.0 | 40 | 8.7243 |
42.8317 | 6.0 | 48 | 8.5317 |
41.8538 | 7.0 | 56 | 8.3205 |
40.5846 | 8.0 | 64 | 8.1027 |
39.4108 | 9.0 | 72 | 7.8955 |
38.588 | 10.0 | 80 | 7.6823 |
37.5276 | 11.0 | 88 | 7.4599 |
36.4055 | 12.0 | 96 | 7.2387 |
35.2071 | 13.0 | 104 | 7.0278 |
34.3234 | 14.0 | 112 | 6.8368 |
33.4358 | 15.0 | 120 | 6.6640 |
32.7482 | 16.0 | 128 | 6.5664 |
32.3436 | 17.0 | 136 | 6.4462 |
31.6928 | 18.0 | 144 | 6.3745 |
31.5663 | 19.0 | 152 | 6.2989 |
31.202 | 20.0 | 160 | 6.2500 |
30.9459 | 21.0 | 168 | 6.2002 |
30.8036 | 22.0 | 176 | 6.1805 |
30.4731 | 23.0 | 184 | 6.1291 |
30.4 | 24.0 | 192 | 6.1024 |
30.2214 | 25.0 | 200 | 6.0663 |
30.0188 | 26.0 | 208 | 6.0528 |
30.0152 | 27.0 | 216 | 6.0160 |
29.5878 | 28.0 | 224 | 5.9904 |
29.6721 | 29.0 | 232 | 5.9685 |
29.4058 | 30.0 | 240 | 5.9469 |
29.5073 | 31.0 | 248 | 5.9272 |
29.5181 | 32.0 | 256 | 5.9112 |
29.2749 | 33.0 | 264 | 5.8886 |
28.9981 | 34.0 | 272 | 5.8692 |
28.961 | 35.0 | 280 | 5.8535 |
28.9255 | 36.0 | 288 | 5.8369 |
28.6316 | 37.0 | 296 | 5.8209 |
28.5788 | 38.0 | 304 | 5.8124 |
28.4946 | 39.0 | 312 | 5.7925 |
28.2078 | 40.0 | 320 | 5.7885 |
28.0854 | 41.0 | 328 | 5.7633 |
28.1322 | 42.0 | 336 | 5.7494 |
27.9595 | 43.0 | 344 | 5.7403 |
27.8058 | 44.0 | 352 | 5.7274 |
27.6942 | 45.0 | 360 | 5.7110 |
27.5208 | 46.0 | 368 | 5.6964 |
27.3724 | 47.0 | 376 | 5.6864 |
27.3083 | 48.0 | 384 | 5.6803 |
27.1867 | 49.0 | 392 | 5.6681 |
27.2074 | 50.0 | 400 | 5.6634 |
27.0896 | 51.0 | 408 | 5.6550 |
26.8508 | 52.0 | 416 | 5.6404 |
26.7945 | 53.0 | 424 | 5.6373 |
26.6411 | 54.0 | 432 | 5.6309 |
26.5528 | 55.0 | 440 | 5.6305 |
26.4657 | 56.0 | 448 | 5.6186 |
26.1433 | 57.0 | 456 | 5.6250 |
26.2097 | 58.0 | 464 | 5.6179 |
26.0528 | 59.0 | 472 | 5.6161 |
26.0273 | 60.0 | 480 | 5.6151 |
25.7964 | 61.0 | 488 | 5.6193 |
25.7268 | 62.0 | 496 | 5.6126 |
25.7201 | 63.0 | 504 | 5.6092 |
25.3521 | 64.0 | 512 | 5.6123 |
25.4724 | 65.0 | 520 | 5.6182 |
25.3514 | 66.0 | 528 | 5.6154 |
25.1205 | 67.0 | 536 | 5.6159 |
25.0164 | 68.0 | 544 | 5.6217 |
24.9465 | 69.0 | 552 | 5.6284 |
24.9826 | 70.0 | 560 | 5.6356 |
24.8623 | 71.0 | 568 | 5.6378 |
24.623 | 72.0 | 576 | 5.6390 |
24.6527 | 73.0 | 584 | 5.6474 |
24.5611 | 74.0 | 592 | 5.6584 |
24.2702 | 75.0 | 600 | 5.6654 |
24.1436 | 76.0 | 608 | 5.6756 |
24.0619 | 77.0 | 616 | 5.6812 |
23.9576 | 78.0 | 624 | 5.6847 |
23.7242 | 79.0 | 632 | 5.6943 |
23.636 | 80.0 | 640 | 5.7032 |
23.6262 | 81.0 | 648 | 5.7136 |
23.5536 | 82.0 | 656 | 5.7244 |
23.2478 | 83.0 | 664 | 5.7401 |
23.0543 | 84.0 | 672 | 5.7457 |
23.1703 | 85.0 | 680 | 5.7572 |
22.9263 | 86.0 | 688 | 5.7806 |
22.82 | 87.0 | 696 | 5.7796 |
22.8149 | 88.0 | 704 | 5.7937 |
22.6482 | 89.0 | 712 | 5.8184 |
22.4487 | 90.0 | 720 | 5.8230 |
22.4615 | 91.0 | 728 | 5.8316 |
22.4335 | 92.0 | 736 | 5.8445 |
22.0866 | 93.0 | 744 | 5.8531 |
22.2906 | 94.0 | 752 | 5.8705 |
22.1485 | 95.0 | 760 | 5.8856 |
22.006 | 96.0 | 768 | 5.8992 |
21.6728 | 97.0 | 776 | 5.9094 |
21.6655 | 98.0 | 784 | 5.9253 |
21.4957 | 99.0 | 792 | 5.9453 |
21.4518 | 100.0 | 800 | 5.9597 |
21.2738 | 101.0 | 808 | 5.9564 |
21.1115 | 102.0 | 816 | 5.9868 |
20.9962 | 103.0 | 824 | 5.9946 |
20.9172 | 104.0 | 832 | 5.9991 |
21.021 | 105.0 | 840 | 6.0163 |
20.7555 | 106.0 | 848 | 6.0351 |
20.6887 | 107.0 | 856 | 6.0554 |
20.6573 | 108.0 | 864 | 6.0588 |
20.4604 | 109.0 | 872 | 6.0805 |
20.2688 | 110.0 | 880 | 6.0841 |
20.2978 | 111.0 | 888 | 6.1073 |
20.1676 | 112.0 | 896 | 6.1259 |
20.0047 | 113.0 | 904 | 6.1382 |
19.987 | 114.0 | 912 | 6.1559 |
19.7857 | 115.0 | 920 | 6.1672 |
19.6289 | 116.0 | 928 | 6.1724 |
19.6443 | 117.0 | 936 | 6.1895 |
19.5238 | 118.0 | 944 | 6.2129 |
19.3597 | 119.0 | 952 | 6.2281 |
19.415 | 120.0 | 960 | 6.2390 |
19.1758 | 121.0 | 968 | 6.2542 |
19.1887 | 122.0 | 976 | 6.2636 |
19.0214 | 123.0 | 984 | 6.2687 |
18.8736 | 124.0 | 992 | 6.2880 |
18.8627 | 125.0 | 1000 | 6.3139 |
18.649 | 126.0 | 1008 | 6.3215 |
18.7256 | 127.0 | 1016 | 6.3317 |
18.5088 | 128.0 | 1024 | 6.3568 |
18.3701 | 129.0 | 1032 | 6.3672 |
18.455 | 130.0 | 1040 | 6.3873 |
18.3206 | 131.0 | 1048 | 6.3910 |
18.0829 | 132.0 | 1056 | 6.4127 |
18.2003 | 133.0 | 1064 | 6.4212 |
17.9939 | 134.0 | 1072 | 6.4374 |
17.8924 | 135.0 | 1080 | 6.4498 |
17.8084 | 136.0 | 1088 | 6.4619 |
17.7389 | 137.0 | 1096 | 6.4699 |
17.7203 | 138.0 | 1104 | 6.4904 |
17.4608 | 139.0 | 1112 | 6.5023 |
17.3992 | 140.0 | 1120 | 6.5115 |
17.449 | 141.0 | 1128 | 6.5301 |
17.2316 | 142.0 | 1136 | 6.5389 |
17.2179 | 143.0 | 1144 | 6.5503 |
17.1479 | 144.0 | 1152 | 6.5604 |
17.0588 | 145.0 | 1160 | 6.5753 |
17.02 | 146.0 | 1168 | 6.5853 |
16.948 | 147.0 | 1176 | 6.6078 |
16.8155 | 148.0 | 1184 | 6.6165 |
16.816 | 149.0 | 1192 | 6.6318 |
16.6817 | 150.0 | 1200 | 6.6428 |
16.6077 | 151.0 | 1208 | 6.6446 |
16.4943 | 152.0 | 1216 | 6.6656 |
16.4805 | 153.0 | 1224 | 6.6699 |
16.3307 | 154.0 | 1232 | 6.6889 |
16.2998 | 155.0 | 1240 | 6.7088 |
16.216 | 156.0 | 1248 | 6.7101 |
16.1999 | 157.0 | 1256 | 6.7263 |
16.1607 | 158.0 | 1264 | 6.7324 |
16.0138 | 159.0 | 1272 | 6.7500 |
15.9838 | 160.0 | 1280 | 6.7589 |
15.8358 | 161.0 | 1288 | 6.7738 |
15.7344 | 162.0 | 1296 | 6.7835 |
15.7017 | 163.0 | 1304 | 6.7875 |
15.7405 | 164.0 | 1312 | 6.8034 |
15.6348 | 165.0 | 1320 | 6.8117 |
15.459 | 166.0 | 1328 | 6.8194 |
15.455 | 167.0 | 1336 | 6.8286 |
15.4361 | 168.0 | 1344 | 6.8453 |
15.3021 | 169.0 | 1352 | 6.8571 |
15.2873 | 170.0 | 1360 | 6.8656 |
15.1446 | 171.0 | 1368 | 6.8757 |
15.1495 | 172.0 | 1376 | 6.8746 |
15.1415 | 173.0 | 1384 | 6.8952 |
15.0827 | 174.0 | 1392 | 6.8990 |
14.9722 | 175.0 | 1400 | 6.9115 |
14.8555 | 176.0 | 1408 | 6.9335 |
14.9168 | 177.0 | 1416 | 6.9373 |
14.8235 | 178.0 | 1424 | 6.9553 |
14.7121 | 179.0 | 1432 | 6.9536 |
14.7223 | 180.0 | 1440 | 6.9653 |
14.6185 | 181.0 | 1448 | 6.9858 |
14.5313 | 182.0 | 1456 | 6.9843 |
14.4481 | 183.0 | 1464 | 7.0015 |
14.4247 | 184.0 | 1472 | 7.0101 |
14.4043 | 185.0 | 1480 | 7.0139 |
14.1844 | 186.0 | 1488 | 7.0221 |
14.2311 | 187.0 | 1496 | 7.0356 |
14.1971 | 188.0 | 1504 | 7.0413 |
14.1968 | 189.0 | 1512 | 7.0568 |
14.161 | 190.0 | 1520 | 7.0636 |
14.0771 | 191.0 | 1528 | 7.0697 |
14.0347 | 192.0 | 1536 | 7.0855 |
13.9711 | 193.0 | 1544 | 7.0787 |
13.9664 | 194.0 | 1552 | 7.0935 |
13.7762 | 195.0 | 1560 | 7.1017 |
13.8125 | 196.0 | 1568 | 7.1168 |
13.7696 | 197.0 | 1576 | 7.1160 |
13.7972 | 198.0 | 1584 | 7.1348 |
13.6743 | 199.0 | 1592 | 7.1384 |
13.5608 | 200.0 | 1600 | 7.1393 |
13.5463 | 201.0 | 1608 | 7.1606 |
13.5895 | 202.0 | 1616 | 7.1575 |
13.4692 | 203.0 | 1624 | 7.1588 |
13.4569 | 204.0 | 1632 | 7.1826 |
13.4296 | 205.0 | 1640 | 7.1860 |
13.4267 | 206.0 | 1648 | 7.1967 |
13.279 | 207.0 | 1656 | 7.2000 |
13.2424 | 208.0 | 1664 | 7.2013 |
13.2571 | 209.0 | 1672 | 7.2114 |
13.1909 | 210.0 | 1680 | 7.2176 |
13.165 | 211.0 | 1688 | 7.2361 |
13.0753 | 212.0 | 1696 | 7.2373 |
13.0506 | 213.0 | 1704 | 7.2349 |
13.061 | 214.0 | 1712 | 7.2490 |
12.9729 | 215.0 | 1720 | 7.2604 |
12.8804 | 216.0 | 1728 | 7.2651 |
12.8953 | 217.0 | 1736 | 7.2707 |
12.8795 | 218.0 | 1744 | 7.2770 |
12.859 | 219.0 | 1752 | 7.2860 |
12.804 | 220.0 | 1760 | 7.2816 |
12.8023 | 221.0 | 1768 | 7.2949 |
12.7356 | 222.0 | 1776 | 7.3053 |
12.6938 | 223.0 | 1784 | 7.3143 |
12.7365 | 224.0 | 1792 | 7.3196 |
12.6243 | 225.0 | 1800 | 7.3240 |
12.533 | 226.0 | 1808 | 7.3305 |
12.5463 | 227.0 | 1816 | 7.3293 |
12.5938 | 228.0 | 1824 | 7.3394 |
12.5622 | 229.0 | 1832 | 7.3468 |
12.5007 | 230.0 | 1840 | 7.3586 |
12.3205 | 231.0 | 1848 | 7.3596 |
12.4015 | 232.0 | 1856 | 7.3618 |
12.3483 | 233.0 | 1864 | 7.3741 |
12.2927 | 234.0 | 1872 | 7.3764 |
12.2057 | 235.0 | 1880 | 7.3808 |
12.3218 | 236.0 | 1888 | 7.3861 |
12.2043 | 237.0 | 1896 | 7.3882 |
12.1129 | 238.0 | 1904 | 7.4035 |
12.1093 | 239.0 | 1912 | 7.4038 |
12.103 | 240.0 | 1920 | 7.4082 |
12.1098 | 241.0 | 1928 | 7.4157 |
12.1093 | 242.0 | 1936 | 7.4137 |
12.0219 | 243.0 | 1944 | 7.4170 |
12.065 | 244.0 | 1952 | 7.4305 |
12.0445 | 245.0 | 1960 | 7.4326 |
11.9507 | 246.0 | 1968 | 7.4349 |
12.0039 | 247.0 | 1976 | 7.4433 |
11.9653 | 248.0 | 1984 | 7.4483 |
11.9137 | 249.0 | 1992 | 7.4565 |
11.9049 | 250.0 | 2000 | 7.4586 |
11.8611 | 251.0 | 2008 | 7.4617 |
11.8262 | 252.0 | 2016 | 7.4610 |
11.7463 | 253.0 | 2024 | 7.4627 |
11.7786 | 254.0 | 2032 | 7.4662 |
11.7193 | 255.0 | 2040 | 7.4807 |
11.7704 | 256.0 | 2048 | 7.4819 |
11.6749 | 257.0 | 2056 | 7.4884 |
11.6517 | 258.0 | 2064 | 7.4897 |
11.6551 | 259.0 | 2072 | 7.4966 |
11.629 | 260.0 | 2080 | 7.4995 |
11.6125 | 261.0 | 2088 | 7.5013 |
11.634 | 262.0 | 2096 | 7.5066 |
11.544 | 263.0 | 2104 | 7.5116 |
11.5611 | 264.0 | 2112 | 7.5148 |
11.5625 | 265.0 | 2120 | 7.5163 |
11.4926 | 266.0 | 2128 | 7.5227 |
11.4875 | 267.0 | 2136 | 7.5220 |
11.4771 | 268.0 | 2144 | 7.5214 |
11.4353 | 269.0 | 2152 | 7.5298 |
11.4137 | 270.0 | 2160 | 7.5368 |
11.3471 | 271.0 | 2168 | 7.5347 |
11.4228 | 272.0 | 2176 | 7.5469 |
11.3343 | 273.0 | 2184 | 7.5428 |
11.3997 | 274.0 | 2192 | 7.5504 |
11.3038 | 275.0 | 2200 | 7.5516 |
11.3 | 276.0 | 2208 | 7.5554 |
11.2164 | 277.0 | 2216 | 7.5531 |
11.2416 | 278.0 | 2224 | 7.5563 |
11.2725 | 279.0 | 2232 | 7.5633 |
11.2514 | 280.0 | 2240 | 7.5727 |
11.2325 | 281.0 | 2248 | 7.5684 |
11.1597 | 282.0 | 2256 | 7.5722 |
11.1741 | 283.0 | 2264 | 7.5684 |
11.1548 | 284.0 | 2272 | 7.5771 |
11.1209 | 285.0 | 2280 | 7.5747 |
11.2112 | 286.0 | 2288 | 7.5805 |
11.0399 | 287.0 | 2296 | 7.5861 |
11.1127 | 288.0 | 2304 | 7.5815 |
11.0834 | 289.0 | 2312 | 7.5894 |
11.15 | 290.0 | 2320 | 7.5891 |
11.0684 | 291.0 | 2328 | 7.5959 |
11.0543 | 292.0 | 2336 | 7.5972 |
11.072 | 293.0 | 2344 | 7.6032 |
10.9922 | 294.0 | 2352 | 7.5986 |
11.0509 | 295.0 | 2360 | 7.6043 |
10.9953 | 296.0 | 2368 | 7.6054 |
11.0534 | 297.0 | 2376 | 7.6097 |
10.9692 | 298.0 | 2384 | 7.6076 |
11.0021 | 299.0 | 2392 | 7.6094 |
10.9839 | 300.0 | 2400 | 7.6158 |
11.0022 | 301.0 | 2408 | 7.6157 |
10.9704 | 302.0 | 2416 | 7.6166 |
10.9331 | 303.0 | 2424 | 7.6185 |
10.9065 | 304.0 | 2432 | 7.6253 |
10.9971 | 305.0 | 2440 | 7.6266 |
10.9224 | 306.0 | 2448 | 7.6242 |
10.9332 | 307.0 | 2456 | 7.6250 |
10.8779 | 308.0 | 2464 | 7.6276 |
10.9014 | 309.0 | 2472 | 7.6266 |
10.9626 | 310.0 | 2480 | 7.6298 |
10.8851 | 311.0 | 2488 | 7.6332 |
10.852 | 312.0 | 2496 | 7.6328 |
10.8442 | 313.0 | 2504 | 7.6362 |
10.8102 | 314.0 | 2512 | 7.6375 |
10.8672 | 315.0 | 2520 | 7.6353 |
10.8428 | 316.0 | 2528 | 7.6398 |
10.8679 | 317.0 | 2536 | 7.6415 |
10.8335 | 318.0 | 2544 | 7.6428 |
10.7906 | 319.0 | 2552 | 7.6477 |
10.7862 | 320.0 | 2560 | 7.6464 |
10.7698 | 321.0 | 2568 | 7.6465 |
10.8097 | 322.0 | 2576 | 7.6473 |
10.7873 | 323.0 | 2584 | 7.6485 |
10.7952 | 324.0 | 2592 | 7.6493 |
10.7948 | 325.0 | 2600 | 7.6496 |
10.7771 | 326.0 | 2608 | 7.6484 |
10.7632 | 327.0 | 2616 | 7.6516 |
10.7615 | 328.0 | 2624 | 7.6518 |
10.7077 | 329.0 | 2632 | 7.6549 |
10.8326 | 330.0 | 2640 | 7.6559 |
10.7826 | 331.0 | 2648 | 7.6583 |
10.7929 | 332.0 | 2656 | 7.6576 |
10.7658 | 333.0 | 2664 | 7.6575 |
10.7309 | 334.0 | 2672 | 7.6600 |
10.6781 | 335.0 | 2680 | 7.6601 |
10.708 | 336.0 | 2688 | 7.6607 |
10.7652 | 337.0 | 2696 | 7.6635 |
10.7291 | 338.0 | 2704 | 7.6599 |
10.7369 | 339.0 | 2712 | 7.6621 |
10.6781 | 340.0 | 2720 | 7.6630 |
10.7016 | 341.0 | 2728 | 7.6603 |
10.6927 | 342.0 | 2736 | 7.6639 |
10.677 | 343.0 | 2744 | 7.6630 |
10.7185 | 344.0 | 2752 | 7.6664 |
10.7302 | 345.0 | 2760 | 7.6640 |
10.6203 | 346.0 | 2768 | 7.6669 |
10.6795 | 347.0 | 2776 | 7.6659 |
10.6393 | 348.0 | 2784 | 7.6649 |
10.6661 | 349.0 | 2792 | 7.6653 |
10.6678 | 350.0 | 2800 | 7.6657 |
10.6722 | 351.0 | 2808 | 7.6663 |
10.682 | 352.0 | 2816 | 7.6673 |
10.6311 | 353.0 | 2824 | 7.6666 |
10.6867 | 354.0 | 2832 | 7.6680 |
10.6871 | 355.0 | 2840 | 7.6688 |
10.6724 | 356.0 | 2848 | 7.6683 |
10.6331 | 357.0 | 2856 | 7.6679 |
10.6949 | 358.0 | 2864 | 7.6674 |
10.5949 | 359.0 | 2872 | 7.6675 |
10.6906 | 360.0 | 2880 | 7.6682 |
10.6349 | 361.0 | 2888 | 7.6682 |
10.5854 | 362.0 | 2896 | 7.6684 |
10.678 | 363.0 | 2904 | 7.6685 |
10.6793 | 364.0 | 2912 | 7.6687 |
10.6762 | 365.0 | 2920 | 7.6690 |
10.729 | 366.0 | 2928 | 7.6691 |
10.6483 | 367.0 | 2936 | 7.6689 |
10.6422 | 368.0 | 2944 | 7.6690 |
10.6841 | 369.0 | 2952 | 7.6690 |
10.6743 | 370.0 | 2960 | 7.6690 |
10.6153 | 371.0 | 2968 | 7.6691 |
10.678 | 372.0 | 2976 | 7.6691 |
10.6384 | 373.0 | 2984 | 7.6691 |
10.6086 | 374.0 | 2992 | 7.6691 |
10.6207 | 375.0 | 3000 | 7.6691 |
Framework versions
- Transformers 4.49.0
- Pytorch 2.4.0+cu121
- Datasets 3.4.0
- Tokenizers 0.21.0
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support