impossible-llms-spanish-random-fourgram

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 7.8039

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
82.9763 1.0 8 10.1322
75.8242 2.0 16 9.3826
73.0479 3.0 24 9.0547
71.7498 4.0 32 8.9132
70.3057 5.0 40 8.7517
69.052 6.0 48 8.5917
67.5792 7.0 56 8.4113
66.1479 8.0 64 8.1993
64.2954 9.0 72 7.9892
62.6406 10.0 80 7.7753
60.8006 11.0 88 7.5550
58.9376 12.0 96 7.3313
57.1674 13.0 104 7.1176
55.776 14.0 112 6.9191
54.0096 15.0 120 6.7468
53.058 16.0 128 6.6053
52.1354 17.0 136 6.4908
51.3954 18.0 144 6.4133
50.8581 19.0 152 6.3606
50.3787 20.0 160 6.3124
50.3071 21.0 168 6.2731
49.996 22.0 176 6.2434
49.3843 23.0 184 6.1913
49.2434 24.0 192 6.1592
49.0138 25.0 200 6.1245
48.5189 26.0 208 6.0972
48.3509 27.0 216 6.0685
48.2833 28.0 224 6.0437
47.6648 29.0 232 6.0185
47.6211 30.0 240 5.9975
47.4067 31.0 248 5.9759
47.2849 32.0 256 5.9579
46.9754 33.0 264 5.9414
46.9424 34.0 272 5.9332
46.761 35.0 280 5.9065
46.4459 36.0 288 5.8899
46.2071 37.0 296 5.8721
46.1396 38.0 304 5.8573
45.9891 39.0 312 5.8446
45.6531 40.0 320 5.8358
45.4177 41.0 328 5.8188
45.4286 42.0 336 5.8082
44.9946 43.0 344 5.7904
44.9944 44.0 352 5.7830
44.727 45.0 360 5.7677
44.3485 46.0 368 5.7522
44.4296 47.0 376 5.7454
44.2211 48.0 384 5.7408
44.11 49.0 392 5.7288
43.9877 50.0 400 5.7172
43.5827 51.0 408 5.7082
43.5725 52.0 416 5.7019
43.3969 53.0 424 5.6927
43.1558 54.0 432 5.6875
42.8696 55.0 440 5.6895
42.8345 56.0 448 5.6810
42.4876 57.0 456 5.6749
42.4041 58.0 464 5.6796
42.167 59.0 472 5.6686
42.0272 60.0 480 5.6692
41.9472 61.0 488 5.6711
41.817 62.0 496 5.6665
41.5342 63.0 504 5.6705
41.3826 64.0 512 5.6707
41.046 65.0 520 5.6737
40.9818 66.0 528 5.6719
40.8324 67.0 536 5.6784
40.5345 68.0 544 5.6842
40.5791 69.0 552 5.6817
40.3238 70.0 560 5.6913
40.2274 71.0 568 5.6917
40.0427 72.0 576 5.7009
39.6969 73.0 584 5.7088
39.4668 74.0 592 5.7189
39.3444 75.0 600 5.7193
39.1904 76.0 608 5.7337
38.9538 77.0 616 5.7351
38.8954 78.0 624 5.7491
38.3855 79.0 632 5.7509
38.4417 80.0 640 5.7696
38.3127 81.0 648 5.7773
38.1975 82.0 656 5.7909
37.9492 83.0 664 5.7997
37.5814 84.0 672 5.8131
37.5652 85.0 680 5.8178
37.3111 86.0 688 5.8396
37.0125 87.0 696 5.8435
37.1248 88.0 704 5.8526
36.7036 89.0 712 5.8648
36.6775 90.0 720 5.8754
36.3811 91.0 728 5.8901
36.3064 92.0 736 5.9137
35.9747 93.0 744 5.9306
35.9777 94.0 752 5.9342
35.6134 95.0 760 5.9527
35.3654 96.0 768 5.9639
35.3432 97.0 776 5.9763
35.2641 98.0 784 5.9824
35.039 99.0 792 5.9972
34.704 100.0 800 6.0233
34.6955 101.0 808 6.0342
34.4793 102.0 816 6.0511
34.2634 103.0 824 6.0523
34.3291 104.0 832 6.0636
34.0863 105.0 840 6.0937
33.6904 106.0 848 6.1048
33.5764 107.0 856 6.1333
33.4184 108.0 864 6.1246
33.2091 109.0 872 6.1466
33.1212 110.0 880 6.1638
32.8773 111.0 888 6.1685
32.8244 112.0 896 6.1778
32.563 113.0 904 6.2110
32.5085 114.0 912 6.2149
32.0799 115.0 920 6.2259
32.0791 116.0 928 6.2462
31.9647 117.0 936 6.2711
31.7568 118.0 944 6.2816
31.6016 119.0 952 6.2748
31.4066 120.0 960 6.3229
31.3133 121.0 968 6.3207
31.1978 122.0 976 6.3222
30.86 123.0 984 6.3559
30.8673 124.0 992 6.3625
30.6247 125.0 1000 6.3782
30.5435 126.0 1008 6.3907
30.4076 127.0 1016 6.4128
30.2195 128.0 1024 6.4249
30.1669 129.0 1032 6.4416
29.9457 130.0 1040 6.4512
29.6963 131.0 1048 6.4563
29.6095 132.0 1056 6.4673
29.5896 133.0 1064 6.4934
29.3053 134.0 1072 6.5024
29.2496 135.0 1080 6.5245
29.0821 136.0 1088 6.5266
28.9603 137.0 1096 6.5416
28.6974 138.0 1104 6.5673
28.6211 139.0 1112 6.5702
28.455 140.0 1120 6.5887
28.3855 141.0 1128 6.6114
28.1946 142.0 1136 6.6128
27.9813 143.0 1144 6.6373
27.9906 144.0 1152 6.6382
27.8784 145.0 1160 6.6586
27.7172 146.0 1168 6.6575
27.5235 147.0 1176 6.6785
27.5081 148.0 1184 6.6878
27.3395 149.0 1192 6.6958
27.2361 150.0 1200 6.7060
26.948 151.0 1208 6.7346
26.9204 152.0 1216 6.7387
26.887 153.0 1224 6.7592
26.6024 154.0 1232 6.7647
26.5708 155.0 1240 6.7677
26.476 156.0 1248 6.7781
26.4475 157.0 1256 6.7916
26.2854 158.0 1264 6.8156
26.188 159.0 1272 6.8198
26.0161 160.0 1280 6.8430
26.024 161.0 1288 6.8527
25.7818 162.0 1296 6.8659
25.7232 163.0 1304 6.8707
25.5568 164.0 1312 6.8754
25.3005 165.0 1320 6.9005
25.2464 166.0 1328 6.9110
25.3019 167.0 1336 6.9115
25.0743 168.0 1344 6.9386
25.0131 169.0 1352 6.9445
24.9058 170.0 1360 6.9428
24.8692 171.0 1368 6.9654
24.6763 172.0 1376 6.9659
24.6819 173.0 1384 6.9842
24.4872 174.0 1392 6.9843
24.4557 175.0 1400 7.0022
24.3423 176.0 1408 7.0041
24.1534 177.0 1416 7.0141
24.0958 178.0 1424 7.0263
24.1577 179.0 1432 7.0363
23.8901 180.0 1440 7.0587
23.8961 181.0 1448 7.0664
23.7979 182.0 1456 7.0685
23.6696 183.0 1464 7.0815
23.4375 184.0 1472 7.0878
23.4381 185.0 1480 7.1067
23.3986 186.0 1488 7.1106
23.3941 187.0 1496 7.1363
23.2634 188.0 1504 7.1369
23.1304 189.0 1512 7.1397
23.0716 190.0 1520 7.1489
22.9345 191.0 1528 7.1453
22.9294 192.0 1536 7.1651
22.8221 193.0 1544 7.1774
22.7517 194.0 1552 7.1864
22.5896 195.0 1560 7.1922
22.6763 196.0 1568 7.2035
22.543 197.0 1576 7.2001
22.3312 198.0 1584 7.2166
22.2966 199.0 1592 7.2292
22.2998 200.0 1600 7.2368
22.2441 201.0 1608 7.2382
22.1787 202.0 1616 7.2474
22.036 203.0 1624 7.2559
22.0256 204.0 1632 7.2747
21.8264 205.0 1640 7.2887
21.847 206.0 1648 7.2940
21.812 207.0 1656 7.3009
21.7328 208.0 1664 7.3048
21.6812 209.0 1672 7.3017
21.512 210.0 1680 7.3180
21.5199 211.0 1688 7.3315
21.3427 212.0 1696 7.3404
21.3916 213.0 1704 7.3363
21.3311 214.0 1712 7.3467
21.2813 215.0 1720 7.3588
21.1493 216.0 1728 7.3633
21.0464 217.0 1736 7.3743
21.0938 218.0 1744 7.3795
21.0753 219.0 1752 7.3858
20.8909 220.0 1760 7.3916
20.8487 221.0 1768 7.3935
20.739 222.0 1776 7.4054
20.7887 223.0 1784 7.4154
20.6405 224.0 1792 7.4227
20.6447 225.0 1800 7.4266
20.4904 226.0 1808 7.4339
20.5563 227.0 1816 7.4355
20.4321 228.0 1824 7.4435
20.5369 229.0 1832 7.4500
20.3137 230.0 1840 7.4613
20.332 231.0 1848 7.4682
20.2773 232.0 1856 7.4766
20.1258 233.0 1864 7.4802
20.1112 234.0 1872 7.4889
20.0866 235.0 1880 7.4954
19.9962 236.0 1888 7.4912
19.9217 237.0 1896 7.4984
19.9182 238.0 1904 7.5087
19.8691 239.0 1912 7.5181
19.8146 240.0 1920 7.5179
19.7451 241.0 1928 7.5246
19.8442 242.0 1936 7.5331
19.7008 243.0 1944 7.5308
19.6039 244.0 1952 7.5359
19.5752 245.0 1960 7.5486
19.603 246.0 1968 7.5452
19.526 247.0 1976 7.5561
19.4567 248.0 1984 7.5610
19.4087 249.0 1992 7.5669
19.4502 250.0 2000 7.5744
19.3038 251.0 2008 7.5777
19.2949 252.0 2016 7.5868
19.1793 253.0 2024 7.5907
19.3054 254.0 2032 7.5956
19.2497 255.0 2040 7.5922
19.1428 256.0 2048 7.6026
19.1033 257.0 2056 7.5987
19.04 258.0 2064 7.6085
19.1332 259.0 2072 7.6133
19.1033 260.0 2080 7.6134
18.9681 261.0 2088 7.6296
18.9423 262.0 2096 7.6220
18.8202 263.0 2104 7.6296
18.9517 264.0 2112 7.6344
18.9199 265.0 2120 7.6375
18.7935 266.0 2128 7.6411
18.7623 267.0 2136 7.6387
18.7257 268.0 2144 7.6487
18.6256 269.0 2152 7.6554
18.6174 270.0 2160 7.6597
18.6969 271.0 2168 7.6578
18.6943 272.0 2176 7.6664
18.4693 273.0 2184 7.6733
18.5465 274.0 2192 7.6674
18.5389 275.0 2200 7.6751
18.4663 276.0 2208 7.6773
18.4731 277.0 2216 7.6769
18.353 278.0 2224 7.6844
18.3503 279.0 2232 7.6879
18.4071 280.0 2240 7.6856
18.4131 281.0 2248 7.6884
18.3167 282.0 2256 7.6958
18.3702 283.0 2264 7.6996
18.2415 284.0 2272 7.7069
18.2712 285.0 2280 7.7060
18.2295 286.0 2288 7.7062
18.2393 287.0 2296 7.7129
18.1915 288.0 2304 7.7102
18.1751 289.0 2312 7.7178
18.1592 290.0 2320 7.7193
18.0663 291.0 2328 7.7231
18.0893 292.0 2336 7.7209
18.0764 293.0 2344 7.7281
18.0296 294.0 2352 7.7293
18.0648 295.0 2360 7.7317
17.98 296.0 2368 7.7354
17.9572 297.0 2376 7.7357
18.0146 298.0 2384 7.7367
18.0386 299.0 2392 7.7416
17.9969 300.0 2400 7.7399
17.9766 301.0 2408 7.7446
18.0088 302.0 2416 7.7450
17.9913 303.0 2424 7.7492
17.9319 304.0 2432 7.7475
17.8594 305.0 2440 7.7524
17.9173 306.0 2448 7.7529
17.8649 307.0 2456 7.7508
17.8351 308.0 2464 7.7565
17.8638 309.0 2472 7.7632
17.8086 310.0 2480 7.7615
17.7833 311.0 2488 7.7627
17.7528 312.0 2496 7.7661
17.7985 313.0 2504 7.7679
17.643 314.0 2512 7.7694
17.6903 315.0 2520 7.7696
17.6834 316.0 2528 7.7720
17.7201 317.0 2536 7.7711
17.7552 318.0 2544 7.7751
17.6127 319.0 2552 7.7723
17.6698 320.0 2560 7.7758
17.6404 321.0 2568 7.7786
17.6297 322.0 2576 7.7802
17.5499 323.0 2584 7.7829
17.6372 324.0 2592 7.7820
17.5715 325.0 2600 7.7868
17.6256 326.0 2608 7.7871
17.5635 327.0 2616 7.7849
17.5831 328.0 2624 7.7874
17.6796 329.0 2632 7.7896
17.5551 330.0 2640 7.7881
17.5849 331.0 2648 7.7889
17.5815 332.0 2656 7.7904
17.5988 333.0 2664 7.7896
17.4424 334.0 2672 7.7922
17.4758 335.0 2680 7.7942
17.5238 336.0 2688 7.7924
17.5397 337.0 2696 7.7945
17.5032 338.0 2704 7.7939
17.5263 339.0 2712 7.7973
17.4627 340.0 2720 7.7952
17.471 341.0 2728 7.7981
17.5093 342.0 2736 7.8003
17.4304 343.0 2744 7.7967
17.4437 344.0 2752 7.7986
17.4244 345.0 2760 7.7994
17.447 346.0 2768 7.8000
17.5448 347.0 2776 7.8001
17.5153 348.0 2784 7.8014
17.5372 349.0 2792 7.8007
17.4114 350.0 2800 7.8015
17.3633 351.0 2808 7.8009
17.4397 352.0 2816 7.8017
17.387 353.0 2824 7.8022
17.5439 354.0 2832 7.8022
17.3737 355.0 2840 7.8020
17.4433 356.0 2848 7.8020
17.4149 357.0 2856 7.8023
17.4908 358.0 2864 7.8023
17.4608 359.0 2872 7.8033
17.3332 360.0 2880 7.8033
17.4148 361.0 2888 7.8037
17.3321 362.0 2896 7.8039
17.4242 363.0 2904 7.8036
17.4116 364.0 2912 7.8034
17.4484 365.0 2920 7.8036
17.4695 366.0 2928 7.8038
17.4316 367.0 2936 7.8039
17.4131 368.0 2944 7.8040
17.3338 369.0 2952 7.8040
17.3707 370.0 2960 7.8039
17.3902 371.0 2968 7.8039
17.3523 372.0 2976 7.8039
17.4936 373.0 2984 7.8039
17.4325 374.0 2992 7.8039
17.4207 375.0 3000 7.8039

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
0
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support