impossible-llms-french-random
This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 6.3476
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 12
- eval_batch_size: 8
- seed: 0
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 8
- total_train_batch_size: 384
- total_eval_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- training_steps: 3000
- mixed_precision_training: Native AMP
- label_smoothing_factor: 0.1
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
9.7239 | 1.0 | 14 | 9.5940 |
9.0541 | 2.0 | 28 | 8.9362 |
8.4544 | 3.0 | 42 | 8.4159 |
8.1879 | 4.0 | 56 | 8.0582 |
7.7091 | 5.0 | 70 | 7.6960 |
7.4121 | 6.0 | 84 | 7.3008 |
6.7935 | 7.0 | 98 | 6.8889 |
6.4858 | 8.0 | 112 | 6.4966 |
6.153 | 9.0 | 126 | 6.1812 |
6.0399 | 10.0 | 140 | 5.9941 |
5.7798 | 11.0 | 154 | 5.9007 |
5.8275 | 12.0 | 168 | 5.8459 |
5.8608 | 13.0 | 182 | 5.8097 |
5.5822 | 14.0 | 196 | 5.7785 |
5.7961 | 15.0 | 210 | 5.7371 |
5.71 | 16.0 | 224 | 5.7096 |
5.559 | 17.0 | 238 | 5.6688 |
5.5411 | 18.0 | 252 | 5.6417 |
5.5046 | 19.0 | 266 | 5.6033 |
5.6118 | 20.0 | 280 | 5.5782 |
5.5337 | 21.0 | 294 | 5.5429 |
5.3872 | 22.0 | 308 | 5.5196 |
5.4351 | 23.0 | 322 | 5.4914 |
5.3531 | 24.0 | 336 | 5.4664 |
5.3101 | 25.0 | 350 | 5.4374 |
5.4831 | 26.0 | 364 | 5.4167 |
5.3678 | 27.0 | 378 | 5.3939 |
5.3613 | 28.0 | 392 | 5.3693 |
5.1893 | 29.0 | 406 | 5.3492 |
5.0765 | 30.0 | 420 | 5.3295 |
5.214 | 31.0 | 434 | 5.3068 |
5.1524 | 32.0 | 448 | 5.2878 |
5.1647 | 33.0 | 462 | 5.2689 |
5.085 | 34.0 | 476 | 5.2546 |
5.1427 | 35.0 | 490 | 5.2398 |
5.0362 | 36.0 | 504 | 5.2204 |
5.1251 | 37.0 | 518 | 5.2122 |
5.0764 | 38.0 | 532 | 5.1996 |
5.0734 | 39.0 | 546 | 5.1884 |
5.0256 | 40.0 | 560 | 5.1747 |
4.8349 | 41.0 | 574 | 5.1653 |
5.0308 | 42.0 | 588 | 5.1561 |
4.859 | 43.0 | 602 | 5.1471 |
4.8695 | 44.0 | 616 | 5.1402 |
4.9189 | 45.0 | 630 | 5.1360 |
4.8724 | 46.0 | 644 | 5.1297 |
4.7789 | 47.0 | 658 | 5.1279 |
4.8509 | 48.0 | 672 | 5.1230 |
4.8738 | 49.0 | 686 | 5.1192 |
4.8657 | 50.0 | 700 | 5.1171 |
4.8311 | 51.0 | 714 | 5.1139 |
4.6525 | 52.0 | 728 | 5.1145 |
4.738 | 53.0 | 742 | 5.1128 |
4.7458 | 54.0 | 756 | 5.1134 |
4.7314 | 55.0 | 770 | 5.1130 |
4.7124 | 56.0 | 784 | 5.1159 |
4.6434 | 57.0 | 798 | 5.1184 |
4.5187 | 58.0 | 812 | 5.1223 |
4.7165 | 59.0 | 826 | 5.1260 |
4.6123 | 60.0 | 840 | 5.1296 |
4.5566 | 61.0 | 854 | 5.1315 |
4.6051 | 62.0 | 868 | 5.1424 |
4.5273 | 63.0 | 882 | 5.1504 |
4.5302 | 64.0 | 896 | 5.1540 |
4.5082 | 65.0 | 910 | 5.1608 |
4.4846 | 66.0 | 924 | 5.1688 |
4.6435 | 67.0 | 938 | 5.1760 |
4.4568 | 68.0 | 952 | 5.1869 |
4.3656 | 69.0 | 966 | 5.1966 |
4.4936 | 70.0 | 980 | 5.2027 |
4.4011 | 71.0 | 994 | 5.2116 |
4.4574 | 72.0 | 1008 | 5.2247 |
4.3478 | 73.0 | 1022 | 5.2348 |
4.3774 | 74.0 | 1036 | 5.2388 |
4.3929 | 75.0 | 1050 | 5.2574 |
4.2449 | 76.0 | 1064 | 5.2648 |
4.3984 | 77.0 | 1078 | 5.2687 |
4.307 | 78.0 | 1092 | 5.2945 |
4.2001 | 79.0 | 1106 | 5.3104 |
4.1673 | 80.0 | 1120 | 5.3154 |
4.1315 | 81.0 | 1134 | 5.3283 |
4.1405 | 82.0 | 1148 | 5.3336 |
4.1806 | 83.0 | 1162 | 5.3505 |
4.0884 | 84.0 | 1176 | 5.3670 |
4.1216 | 85.0 | 1190 | 5.3833 |
3.9938 | 86.0 | 1204 | 5.3921 |
4.2434 | 87.0 | 1218 | 5.4204 |
4.1055 | 88.0 | 1232 | 5.4208 |
4.056 | 89.0 | 1246 | 5.4324 |
4.1179 | 90.0 | 1260 | 5.4422 |
3.9529 | 91.0 | 1274 | 5.4620 |
3.9277 | 92.0 | 1288 | 5.4791 |
4.0086 | 93.0 | 1302 | 5.4906 |
4.0078 | 94.0 | 1316 | 5.5060 |
3.967 | 95.0 | 1330 | 5.5235 |
3.8827 | 96.0 | 1344 | 5.5325 |
3.9483 | 97.0 | 1358 | 5.5452 |
3.7035 | 98.0 | 1372 | 5.5580 |
3.9345 | 99.0 | 1386 | 5.5694 |
3.7981 | 100.0 | 1400 | 5.5881 |
3.8 | 101.0 | 1414 | 5.5959 |
3.8909 | 102.0 | 1428 | 5.6176 |
3.7846 | 103.0 | 1442 | 5.6213 |
3.7542 | 104.0 | 1456 | 5.6423 |
3.8228 | 105.0 | 1470 | 5.6563 |
3.7579 | 106.0 | 1484 | 5.6698 |
3.7408 | 107.0 | 1498 | 5.6842 |
3.743 | 108.0 | 1512 | 5.6946 |
3.7625 | 109.0 | 1526 | 5.7063 |
3.7048 | 110.0 | 1540 | 5.7205 |
3.7228 | 111.0 | 1554 | 5.7425 |
3.7455 | 112.0 | 1568 | 5.7417 |
3.6156 | 113.0 | 1582 | 5.7529 |
3.6318 | 114.0 | 1596 | 5.7795 |
3.6512 | 115.0 | 1610 | 5.7867 |
3.6084 | 116.0 | 1624 | 5.7983 |
3.6663 | 117.0 | 1638 | 5.8173 |
3.5936 | 118.0 | 1652 | 5.8259 |
3.5861 | 119.0 | 1666 | 5.8341 |
3.6022 | 120.0 | 1680 | 5.8478 |
3.5503 | 121.0 | 1694 | 5.8661 |
3.5486 | 122.0 | 1708 | 5.8782 |
3.5129 | 123.0 | 1722 | 5.8781 |
3.487 | 124.0 | 1736 | 5.8993 |
3.4293 | 125.0 | 1750 | 5.9005 |
3.4851 | 126.0 | 1764 | 5.9226 |
3.4599 | 127.0 | 1778 | 5.9286 |
3.4263 | 128.0 | 1792 | 5.9358 |
3.4926 | 129.0 | 1806 | 5.9525 |
3.4187 | 130.0 | 1820 | 5.9570 |
3.4111 | 131.0 | 1834 | 5.9680 |
3.3141 | 132.0 | 1848 | 5.9796 |
3.3957 | 133.0 | 1862 | 6.0007 |
3.3552 | 134.0 | 1876 | 6.0074 |
3.3251 | 135.0 | 1890 | 6.0152 |
3.3961 | 136.0 | 1904 | 6.0240 |
3.3969 | 137.0 | 1918 | 6.0281 |
3.3388 | 138.0 | 1932 | 6.0427 |
3.3769 | 139.0 | 1946 | 6.0496 |
3.3347 | 140.0 | 1960 | 6.0549 |
3.3017 | 141.0 | 1974 | 6.0707 |
3.3262 | 142.0 | 1988 | 6.0811 |
3.2948 | 143.0 | 2002 | 6.0822 |
3.2483 | 144.0 | 2016 | 6.0972 |
3.3044 | 145.0 | 2030 | 6.1005 |
3.3063 | 146.0 | 2044 | 6.1137 |
3.314 | 147.0 | 2058 | 6.1161 |
3.2344 | 148.0 | 2072 | 6.1306 |
3.2255 | 149.0 | 2086 | 6.1325 |
3.2309 | 150.0 | 2100 | 6.1505 |
3.2507 | 151.0 | 2114 | 6.1408 |
3.1794 | 152.0 | 2128 | 6.1609 |
3.1769 | 153.0 | 2142 | 6.1664 |
3.2022 | 154.0 | 2156 | 6.1704 |
3.1976 | 155.0 | 2170 | 6.1839 |
3.2118 | 156.0 | 2184 | 6.1871 |
3.1701 | 157.0 | 2198 | 6.1943 |
3.1784 | 158.0 | 2212 | 6.2008 |
3.157 | 159.0 | 2226 | 6.2035 |
3.0987 | 160.0 | 2240 | 6.2117 |
3.171 | 161.0 | 2254 | 6.2192 |
3.141 | 162.0 | 2268 | 6.2264 |
3.1079 | 163.0 | 2282 | 6.2337 |
3.1489 | 164.0 | 2296 | 6.2367 |
3.1221 | 165.0 | 2310 | 6.2470 |
3.1333 | 166.0 | 2324 | 6.2492 |
3.1044 | 167.0 | 2338 | 6.2513 |
3.1592 | 168.0 | 2352 | 6.2611 |
3.1071 | 169.0 | 2366 | 6.2626 |
3.1478 | 170.0 | 2380 | 6.2664 |
3.0939 | 171.0 | 2394 | 6.2739 |
3.1169 | 172.0 | 2408 | 6.2762 |
3.0876 | 173.0 | 2422 | 6.2851 |
3.0818 | 174.0 | 2436 | 6.2915 |
3.0725 | 175.0 | 2450 | 6.2912 |
3.0354 | 176.0 | 2464 | 6.2937 |
3.0486 | 177.0 | 2478 | 6.2940 |
3.0642 | 178.0 | 2492 | 6.2969 |
3.0258 | 179.0 | 2506 | 6.3041 |
3.0098 | 180.0 | 2520 | 6.3044 |
2.9562 | 181.0 | 2534 | 6.3089 |
3.0271 | 182.0 | 2548 | 6.3124 |
3.0669 | 183.0 | 2562 | 6.3161 |
3.0596 | 184.0 | 2576 | 6.3182 |
3.0839 | 185.0 | 2590 | 6.3187 |
2.9886 | 186.0 | 2604 | 6.3193 |
3.0458 | 187.0 | 2618 | 6.3256 |
3.0601 | 188.0 | 2632 | 6.3277 |
3.011 | 189.0 | 2646 | 6.3301 |
2.994 | 190.0 | 2660 | 6.3313 |
3.0363 | 191.0 | 2674 | 6.3320 |
3.0406 | 192.0 | 2688 | 6.3347 |
3.0 | 193.0 | 2702 | 6.3363 |
2.9808 | 194.0 | 2716 | 6.3363 |
3.0318 | 195.0 | 2730 | 6.3380 |
3.0064 | 196.0 | 2744 | 6.3411 |
2.9668 | 197.0 | 2758 | 6.3419 |
3.0299 | 198.0 | 2772 | 6.3419 |
2.9816 | 199.0 | 2786 | 6.3427 |
3.024 | 200.0 | 2800 | 6.3437 |
2.9643 | 201.0 | 2814 | 6.3449 |
2.9604 | 202.0 | 2828 | 6.3454 |
2.9981 | 203.0 | 2842 | 6.3454 |
3.0149 | 204.0 | 2856 | 6.3473 |
2.9935 | 205.0 | 2870 | 6.3464 |
2.9979 | 206.0 | 2884 | 6.3470 |
2.903 | 207.0 | 2898 | 6.3468 |
2.9757 | 208.0 | 2912 | 6.3473 |
2.9839 | 209.0 | 2926 | 6.3472 |
3.0279 | 210.0 | 2940 | 6.3473 |
3.0349 | 211.0 | 2954 | 6.3475 |
2.9588 | 212.0 | 2968 | 6.3476 |
2.9949 | 213.0 | 2982 | 6.3476 |
3.004 | 214.0 | 2996 | 6.3476 |
24.0133 | 214.3048 | 3000 | 6.3476 |
Framework versions
- Transformers 4.49.0
- Pytorch 2.4.0+cu121
- Datasets 3.4.0
- Tokenizers 0.21.0
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support