impossible-llms-french-random

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 6.3476

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
9.7239 1.0 14 9.5940
9.0541 2.0 28 8.9362
8.4544 3.0 42 8.4159
8.1879 4.0 56 8.0582
7.7091 5.0 70 7.6960
7.4121 6.0 84 7.3008
6.7935 7.0 98 6.8889
6.4858 8.0 112 6.4966
6.153 9.0 126 6.1812
6.0399 10.0 140 5.9941
5.7798 11.0 154 5.9007
5.8275 12.0 168 5.8459
5.8608 13.0 182 5.8097
5.5822 14.0 196 5.7785
5.7961 15.0 210 5.7371
5.71 16.0 224 5.7096
5.559 17.0 238 5.6688
5.5411 18.0 252 5.6417
5.5046 19.0 266 5.6033
5.6118 20.0 280 5.5782
5.5337 21.0 294 5.5429
5.3872 22.0 308 5.5196
5.4351 23.0 322 5.4914
5.3531 24.0 336 5.4664
5.3101 25.0 350 5.4374
5.4831 26.0 364 5.4167
5.3678 27.0 378 5.3939
5.3613 28.0 392 5.3693
5.1893 29.0 406 5.3492
5.0765 30.0 420 5.3295
5.214 31.0 434 5.3068
5.1524 32.0 448 5.2878
5.1647 33.0 462 5.2689
5.085 34.0 476 5.2546
5.1427 35.0 490 5.2398
5.0362 36.0 504 5.2204
5.1251 37.0 518 5.2122
5.0764 38.0 532 5.1996
5.0734 39.0 546 5.1884
5.0256 40.0 560 5.1747
4.8349 41.0 574 5.1653
5.0308 42.0 588 5.1561
4.859 43.0 602 5.1471
4.8695 44.0 616 5.1402
4.9189 45.0 630 5.1360
4.8724 46.0 644 5.1297
4.7789 47.0 658 5.1279
4.8509 48.0 672 5.1230
4.8738 49.0 686 5.1192
4.8657 50.0 700 5.1171
4.8311 51.0 714 5.1139
4.6525 52.0 728 5.1145
4.738 53.0 742 5.1128
4.7458 54.0 756 5.1134
4.7314 55.0 770 5.1130
4.7124 56.0 784 5.1159
4.6434 57.0 798 5.1184
4.5187 58.0 812 5.1223
4.7165 59.0 826 5.1260
4.6123 60.0 840 5.1296
4.5566 61.0 854 5.1315
4.6051 62.0 868 5.1424
4.5273 63.0 882 5.1504
4.5302 64.0 896 5.1540
4.5082 65.0 910 5.1608
4.4846 66.0 924 5.1688
4.6435 67.0 938 5.1760
4.4568 68.0 952 5.1869
4.3656 69.0 966 5.1966
4.4936 70.0 980 5.2027
4.4011 71.0 994 5.2116
4.4574 72.0 1008 5.2247
4.3478 73.0 1022 5.2348
4.3774 74.0 1036 5.2388
4.3929 75.0 1050 5.2574
4.2449 76.0 1064 5.2648
4.3984 77.0 1078 5.2687
4.307 78.0 1092 5.2945
4.2001 79.0 1106 5.3104
4.1673 80.0 1120 5.3154
4.1315 81.0 1134 5.3283
4.1405 82.0 1148 5.3336
4.1806 83.0 1162 5.3505
4.0884 84.0 1176 5.3670
4.1216 85.0 1190 5.3833
3.9938 86.0 1204 5.3921
4.2434 87.0 1218 5.4204
4.1055 88.0 1232 5.4208
4.056 89.0 1246 5.4324
4.1179 90.0 1260 5.4422
3.9529 91.0 1274 5.4620
3.9277 92.0 1288 5.4791
4.0086 93.0 1302 5.4906
4.0078 94.0 1316 5.5060
3.967 95.0 1330 5.5235
3.8827 96.0 1344 5.5325
3.9483 97.0 1358 5.5452
3.7035 98.0 1372 5.5580
3.9345 99.0 1386 5.5694
3.7981 100.0 1400 5.5881
3.8 101.0 1414 5.5959
3.8909 102.0 1428 5.6176
3.7846 103.0 1442 5.6213
3.7542 104.0 1456 5.6423
3.8228 105.0 1470 5.6563
3.7579 106.0 1484 5.6698
3.7408 107.0 1498 5.6842
3.743 108.0 1512 5.6946
3.7625 109.0 1526 5.7063
3.7048 110.0 1540 5.7205
3.7228 111.0 1554 5.7425
3.7455 112.0 1568 5.7417
3.6156 113.0 1582 5.7529
3.6318 114.0 1596 5.7795
3.6512 115.0 1610 5.7867
3.6084 116.0 1624 5.7983
3.6663 117.0 1638 5.8173
3.5936 118.0 1652 5.8259
3.5861 119.0 1666 5.8341
3.6022 120.0 1680 5.8478
3.5503 121.0 1694 5.8661
3.5486 122.0 1708 5.8782
3.5129 123.0 1722 5.8781
3.487 124.0 1736 5.8993
3.4293 125.0 1750 5.9005
3.4851 126.0 1764 5.9226
3.4599 127.0 1778 5.9286
3.4263 128.0 1792 5.9358
3.4926 129.0 1806 5.9525
3.4187 130.0 1820 5.9570
3.4111 131.0 1834 5.9680
3.3141 132.0 1848 5.9796
3.3957 133.0 1862 6.0007
3.3552 134.0 1876 6.0074
3.3251 135.0 1890 6.0152
3.3961 136.0 1904 6.0240
3.3969 137.0 1918 6.0281
3.3388 138.0 1932 6.0427
3.3769 139.0 1946 6.0496
3.3347 140.0 1960 6.0549
3.3017 141.0 1974 6.0707
3.3262 142.0 1988 6.0811
3.2948 143.0 2002 6.0822
3.2483 144.0 2016 6.0972
3.3044 145.0 2030 6.1005
3.3063 146.0 2044 6.1137
3.314 147.0 2058 6.1161
3.2344 148.0 2072 6.1306
3.2255 149.0 2086 6.1325
3.2309 150.0 2100 6.1505
3.2507 151.0 2114 6.1408
3.1794 152.0 2128 6.1609
3.1769 153.0 2142 6.1664
3.2022 154.0 2156 6.1704
3.1976 155.0 2170 6.1839
3.2118 156.0 2184 6.1871
3.1701 157.0 2198 6.1943
3.1784 158.0 2212 6.2008
3.157 159.0 2226 6.2035
3.0987 160.0 2240 6.2117
3.171 161.0 2254 6.2192
3.141 162.0 2268 6.2264
3.1079 163.0 2282 6.2337
3.1489 164.0 2296 6.2367
3.1221 165.0 2310 6.2470
3.1333 166.0 2324 6.2492
3.1044 167.0 2338 6.2513
3.1592 168.0 2352 6.2611
3.1071 169.0 2366 6.2626
3.1478 170.0 2380 6.2664
3.0939 171.0 2394 6.2739
3.1169 172.0 2408 6.2762
3.0876 173.0 2422 6.2851
3.0818 174.0 2436 6.2915
3.0725 175.0 2450 6.2912
3.0354 176.0 2464 6.2937
3.0486 177.0 2478 6.2940
3.0642 178.0 2492 6.2969
3.0258 179.0 2506 6.3041
3.0098 180.0 2520 6.3044
2.9562 181.0 2534 6.3089
3.0271 182.0 2548 6.3124
3.0669 183.0 2562 6.3161
3.0596 184.0 2576 6.3182
3.0839 185.0 2590 6.3187
2.9886 186.0 2604 6.3193
3.0458 187.0 2618 6.3256
3.0601 188.0 2632 6.3277
3.011 189.0 2646 6.3301
2.994 190.0 2660 6.3313
3.0363 191.0 2674 6.3320
3.0406 192.0 2688 6.3347
3.0 193.0 2702 6.3363
2.9808 194.0 2716 6.3363
3.0318 195.0 2730 6.3380
3.0064 196.0 2744 6.3411
2.9668 197.0 2758 6.3419
3.0299 198.0 2772 6.3419
2.9816 199.0 2786 6.3427
3.024 200.0 2800 6.3437
2.9643 201.0 2814 6.3449
2.9604 202.0 2828 6.3454
2.9981 203.0 2842 6.3454
3.0149 204.0 2856 6.3473
2.9935 205.0 2870 6.3464
2.9979 206.0 2884 6.3470
2.903 207.0 2898 6.3468
2.9757 208.0 2912 6.3473
2.9839 209.0 2926 6.3472
3.0279 210.0 2940 6.3473
3.0349 211.0 2954 6.3475
2.9588 212.0 2968 6.3476
2.9949 213.0 2982 6.3476
3.004 214.0 2996 6.3476
24.0133 214.3048 3000 6.3476

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
0
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support