impossible-llms-spanish-fronting-n

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 7.4765

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
51.7784 1.0 8 10.1116
47.2634 2.0 16 9.3470
45.3511 3.0 24 9.0155
44.5456 4.0 32 8.8750
43.7733 5.0 40 8.7163
42.7853 6.0 48 8.5225
41.7995 7.0 56 8.3087
40.4943 8.0 64 8.0867
39.302 9.0 72 7.8698
38.4185 10.0 80 7.6418
37.3145 11.0 88 7.4187
36.1962 12.0 96 7.1951
34.9774 13.0 104 6.9837
34.0831 14.0 112 6.7900
33.1721 15.0 120 6.6131
32.4419 16.0 128 6.4728
31.9161 17.0 136 6.3644
31.2219 18.0 144 6.2884
31.1113 19.0 152 6.2115
30.7667 20.0 160 6.1593
30.5015 21.0 168 6.1089
30.3263 22.0 176 6.0646
29.9469 23.0 184 6.0196
29.8472 24.0 192 5.9851
29.634 25.0 200 5.9474
29.3636 26.0 208 5.9247
29.3971 27.0 216 5.8943
28.9826 28.0 224 5.8664
28.9979 29.0 232 5.8435
28.7772 30.0 240 5.8160
28.8633 31.0 248 5.8035
28.8801 32.0 256 5.7748
28.613 33.0 264 5.7570
28.377 34.0 272 5.7357
28.2545 35.0 280 5.7216
28.2638 36.0 288 5.7025
27.9661 37.0 296 5.6895
27.8934 38.0 304 5.6784
27.8273 39.0 312 5.6638
27.5359 40.0 320 5.6506
27.4462 41.0 328 5.6327
27.4709 42.0 336 5.6117
27.3402 43.0 344 5.6071
27.1623 44.0 352 5.5898
27.0289 45.0 360 5.5780
26.8429 46.0 368 5.5608
26.7129 47.0 376 5.5495
26.6637 48.0 384 5.5411
26.5017 49.0 392 5.5316
26.5155 50.0 400 5.5264
26.4028 51.0 408 5.5115
26.1552 52.0 416 5.5025
26.1335 53.0 424 5.5021
25.9309 54.0 432 5.4959
25.8646 55.0 440 5.4891
25.7497 56.0 448 5.4911
25.4582 57.0 456 5.4858
25.507 58.0 464 5.4752
25.3367 59.0 472 5.4773
25.353 60.0 480 5.4720
25.0761 61.0 488 5.4754
25.0215 62.0 496 5.4691
25.0473 63.0 504 5.4730
24.6567 64.0 512 5.4700
24.7463 65.0 520 5.4742
24.6361 66.0 528 5.4784
24.4217 67.0 536 5.4747
24.3268 68.0 544 5.4747
24.264 69.0 552 5.4828
24.284 70.0 560 5.4972
24.1288 71.0 568 5.4933
23.9271 72.0 576 5.4963
23.9558 73.0 584 5.5114
23.8464 74.0 592 5.5095
23.5567 75.0 600 5.5177
23.4508 76.0 608 5.5224
23.3597 77.0 616 5.5401
23.2514 78.0 624 5.5444
23.0297 79.0 632 5.5501
22.8961 80.0 640 5.5634
22.9509 81.0 648 5.5759
22.856 82.0 656 5.5821
22.5475 83.0 664 5.5873
22.3554 84.0 672 5.6013
22.4708 85.0 680 5.6089
22.2423 86.0 688 5.6206
22.1425 87.0 696 5.6361
22.1266 88.0 704 5.6449
21.9551 89.0 712 5.6537
21.7956 90.0 720 5.6684
21.7506 91.0 728 5.6829
21.766 92.0 736 5.6955
21.4058 93.0 744 5.7129
21.5818 94.0 752 5.7187
21.5058 95.0 760 5.7243
21.3181 96.0 768 5.7320
21.0303 97.0 776 5.7566
20.9767 98.0 784 5.7750
20.8306 99.0 792 5.7804
20.8078 100.0 800 5.7926
20.5814 101.0 808 5.8065
20.5153 102.0 816 5.8193
20.33 103.0 824 5.8343
20.304 104.0 832 5.8458
20.4121 105.0 840 5.8602
20.1227 106.0 848 5.8680
20.107 107.0 856 5.8870
20.0244 108.0 864 5.8983
19.8429 109.0 872 5.9154
19.6658 110.0 880 5.9283
19.7151 111.0 888 5.9440
19.52 112.0 896 5.9515
19.4396 113.0 904 5.9701
19.3786 114.0 912 5.9825
19.2078 115.0 920 6.0011
19.059 116.0 928 6.0191
19.0577 117.0 936 6.0321
18.9654 118.0 944 6.0395
18.8006 119.0 952 6.0530
18.7847 120.0 960 6.0753
18.6216 121.0 968 6.0882
18.6164 122.0 976 6.1006
18.4312 123.0 984 6.1104
18.3627 124.0 992 6.1188
18.323 125.0 1000 6.1404
18.1081 126.0 1008 6.1458
18.1846 127.0 1016 6.1653
17.9619 128.0 1024 6.1814
17.8304 129.0 1032 6.1954
17.9639 130.0 1040 6.2034
17.7216 131.0 1048 6.2219
17.5679 132.0 1056 6.2268
17.6338 133.0 1064 6.2405
17.4533 134.0 1072 6.2574
17.3785 135.0 1080 6.2745
17.336 136.0 1088 6.2777
17.239 137.0 1096 6.3045
17.2146 138.0 1104 6.3152
16.9797 139.0 1112 6.3267
16.8837 140.0 1120 6.3396
16.9108 141.0 1128 6.3553
16.7512 142.0 1136 6.3601
16.7044 143.0 1144 6.3728
16.672 144.0 1152 6.3835
16.5641 145.0 1160 6.4016
16.5522 146.0 1168 6.4245
16.4778 147.0 1176 6.4233
16.3825 148.0 1184 6.4333
16.3415 149.0 1192 6.4436
16.1825 150.0 1200 6.4635
16.1146 151.0 1208 6.4792
16.034 152.0 1216 6.4810
16.015 153.0 1224 6.5045
15.8937 154.0 1232 6.5265
15.8686 155.0 1240 6.5210
15.7685 156.0 1248 6.5367
15.7502 157.0 1256 6.5394
15.7103 158.0 1264 6.5634
15.5698 159.0 1272 6.5660
15.5699 160.0 1280 6.5810
15.4384 161.0 1288 6.5823
15.303 162.0 1296 6.5966
15.2721 163.0 1304 6.6100
15.2977 164.0 1312 6.6249
15.2382 165.0 1320 6.6279
15.0412 166.0 1328 6.6498
15.0258 167.0 1336 6.6603
14.9784 168.0 1344 6.6658
14.8854 169.0 1352 6.6881
14.8676 170.0 1360 6.6862
14.7526 171.0 1368 6.6944
14.7146 172.0 1376 6.7082
14.7218 173.0 1384 6.7177
14.6535 174.0 1392 6.7232
14.5767 175.0 1400 6.7316
14.4947 176.0 1408 6.7440
14.5144 177.0 1416 6.7538
14.4185 178.0 1424 6.7655
14.3 179.0 1432 6.7633
14.3174 180.0 1440 6.7894
14.191 181.0 1448 6.7982
14.143 182.0 1456 6.8087
14.0423 183.0 1464 6.8203
14.018 184.0 1472 6.8300
14.0147 185.0 1480 6.8352
13.8292 186.0 1488 6.8433
13.8796 187.0 1496 6.8489
13.838 188.0 1504 6.8627
13.7829 189.0 1512 6.8665
13.7829 190.0 1520 6.8765
13.6909 191.0 1528 6.8911
13.6776 192.0 1536 6.8976
13.5998 193.0 1544 6.9042
13.5913 194.0 1552 6.9140
13.3959 195.0 1560 6.9231
13.4454 196.0 1568 6.9237
13.4054 197.0 1576 6.9486
13.4084 198.0 1584 6.9489
13.2953 199.0 1592 6.9499
13.1916 200.0 1600 6.9641
13.1678 201.0 1608 6.9675
13.2026 202.0 1616 6.9851
13.1273 203.0 1624 6.9830
13.0846 204.0 1632 6.9894
13.0633 205.0 1640 7.0006
13.0592 206.0 1648 7.0150
12.9165 207.0 1656 7.0081
12.9007 208.0 1664 7.0187
12.8814 209.0 1672 7.0224
12.8298 210.0 1680 7.0354
12.8303 211.0 1688 7.0361
12.713 212.0 1696 7.0449
12.706 213.0 1704 7.0602
12.7231 214.0 1712 7.0602
12.6166 215.0 1720 7.0736
12.5342 216.0 1728 7.0808
12.534 217.0 1736 7.0942
12.5529 218.0 1744 7.0907
12.5035 219.0 1752 7.0909
12.4415 220.0 1760 7.1029
12.4305 221.0 1768 7.1177
12.3753 222.0 1776 7.1214
12.3459 223.0 1784 7.1190
12.4079 224.0 1792 7.1317
12.2867 225.0 1800 7.1421
12.2105 226.0 1808 7.1439
12.1905 227.0 1816 7.1524
12.2438 228.0 1824 7.1568
12.2483 229.0 1832 7.1593
12.1855 230.0 1840 7.1681
12.0293 231.0 1848 7.1758
12.0954 232.0 1856 7.1741
12.0445 233.0 1864 7.1902
11.9761 234.0 1872 7.1862
11.8848 235.0 1880 7.1927
11.9529 236.0 1888 7.1979
11.885 237.0 1896 7.2046
11.7895 238.0 1904 7.2161
11.7884 239.0 1912 7.2123
11.779 240.0 1920 7.2259
11.8171 241.0 1928 7.2221
11.7989 242.0 1936 7.2265
11.7214 243.0 1944 7.2291
11.7687 244.0 1952 7.2418
11.7037 245.0 1960 7.2363
11.6243 246.0 1968 7.2472
11.7001 247.0 1976 7.2602
11.6701 248.0 1984 7.2492
11.6172 249.0 1992 7.2614
11.6292 250.0 2000 7.2619
11.5318 251.0 2008 7.2705
11.5071 252.0 2016 7.2762
11.4715 253.0 2024 7.2762
11.4828 254.0 2032 7.2840
11.4232 255.0 2040 7.2845
11.4747 256.0 2048 7.2895
11.3745 257.0 2056 7.2940
11.3526 258.0 2064 7.2957
11.3843 259.0 2072 7.3001
11.3224 260.0 2080 7.3046
11.3156 261.0 2088 7.3082
11.3194 262.0 2096 7.3134
11.2449 263.0 2104 7.3191
11.2619 264.0 2112 7.3185
11.2628 265.0 2120 7.3283
11.2272 266.0 2128 7.3375
11.1958 267.0 2136 7.3299
11.1988 268.0 2144 7.3399
11.1489 269.0 2152 7.3462
11.1609 270.0 2160 7.3442
11.102 271.0 2168 7.3518
11.1255 272.0 2176 7.3539
11.0609 273.0 2184 7.3515
11.1389 274.0 2192 7.3631
11.0255 275.0 2200 7.3614
11.0338 276.0 2208 7.3632
10.9559 277.0 2216 7.3717
10.966 278.0 2224 7.3656
10.9771 279.0 2232 7.3719
10.9786 280.0 2240 7.3738
10.9463 281.0 2248 7.3744
10.8997 282.0 2256 7.3783
10.8937 283.0 2264 7.3798
10.8832 284.0 2272 7.3848
10.8433 285.0 2280 7.3872
10.9294 286.0 2288 7.3907
10.7831 287.0 2296 7.3915
10.8367 288.0 2304 7.3944
10.8091 289.0 2312 7.4008
10.8585 290.0 2320 7.3947
10.7879 291.0 2328 7.4077
10.767 292.0 2336 7.4011
10.8139 293.0 2344 7.4059
10.7572 294.0 2352 7.4066
10.7829 295.0 2360 7.4123
10.7285 296.0 2368 7.4130
10.7642 297.0 2376 7.4128
10.7138 298.0 2384 7.4188
10.7285 299.0 2392 7.4162
10.7366 300.0 2400 7.4211
10.7125 301.0 2408 7.4212
10.6999 302.0 2416 7.4237
10.6657 303.0 2424 7.4270
10.6579 304.0 2432 7.4259
10.7066 305.0 2440 7.4298
10.6463 306.0 2448 7.4331
10.683 307.0 2456 7.4295
10.6238 308.0 2464 7.4371
10.6731 309.0 2472 7.4378
10.6708 310.0 2480 7.4341
10.6065 311.0 2488 7.4432
10.5891 312.0 2496 7.4401
10.581 313.0 2504 7.4412
10.564 314.0 2512 7.4472
10.6173 315.0 2520 7.4462
10.6134 316.0 2528 7.4493
10.5954 317.0 2536 7.4517
10.5617 318.0 2544 7.4524
10.5486 319.0 2552 7.4545
10.5362 320.0 2560 7.4524
10.5081 321.0 2568 7.4523
10.5617 322.0 2576 7.4552
10.521 323.0 2584 7.4573
10.5199 324.0 2592 7.4553
10.5411 325.0 2600 7.4570
10.5266 326.0 2608 7.4552
10.5189 327.0 2616 7.4620
10.5166 328.0 2624 7.4585
10.454 329.0 2632 7.4612
10.5651 330.0 2640 7.4632
10.5066 331.0 2648 7.4632
10.5266 332.0 2656 7.4614
10.5283 333.0 2664 7.4652
10.4858 334.0 2672 7.4663
10.4521 335.0 2680 7.4663
10.4572 336.0 2688 7.4655
10.5046 337.0 2696 7.4665
10.4653 338.0 2704 7.4674
10.4872 339.0 2712 7.4699
10.4266 340.0 2720 7.4682
10.4524 341.0 2728 7.4707
10.4534 342.0 2736 7.4708
10.4305 343.0 2744 7.4715
10.4627 344.0 2752 7.4713
10.4557 345.0 2760 7.4736
10.3853 346.0 2768 7.4724
10.4317 347.0 2776 7.4720
10.3978 348.0 2784 7.4729
10.4376 349.0 2792 7.4746
10.4341 350.0 2800 7.4734
10.4245 351.0 2808 7.4735
10.415 352.0 2816 7.4741
10.3609 353.0 2824 7.4739
10.4545 354.0 2832 7.4749
10.4374 355.0 2840 7.4748
10.4411 356.0 2848 7.4753
10.3917 357.0 2856 7.4759
10.4483 358.0 2864 7.4765
10.3699 359.0 2872 7.4759
10.4319 360.0 2880 7.4760
10.3863 361.0 2888 7.4761
10.35 362.0 2896 7.4759
10.4225 363.0 2904 7.4756
10.4223 364.0 2912 7.4758
10.4389 365.0 2920 7.4760
10.469 366.0 2928 7.4764
10.4033 367.0 2936 7.4765
10.3862 368.0 2944 7.4765
10.4426 369.0 2952 7.4765
10.4102 370.0 2960 7.4766
10.3714 371.0 2968 7.4765
10.4194 372.0 2976 7.4765
10.3912 373.0 2984 7.4765
10.359 374.0 2992 7.4765
10.3752 375.0 3000 7.4765

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
3
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support