frdirect
This model is a fine-tuned version of facebook/mms-1b-all on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.1592
- Wer: 0.1081
- Bleu: 0.7992
- Rouge: {'rouge1': 0.9142461267493629, 'rouge2': 0.8512223456977356, 'rougeL': 0.9140461108455781, 'rougeLsum': 0.9139759112519872}
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 100
- num_epochs: 30
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Wer | Bleu | Rouge |
---|---|---|---|---|---|---|
6.0797 | 0.3512 | 100 | 0.3918 | 0.2286 | 0.6250 | {'rouge1': 0.7964874009051572, 'rouge2': 0.6786673339200013, 'rougeL': 0.795684539225159, 'rougeLsum': 0.7956009761818327} |
0.4361 | 0.7024 | 200 | 0.3395 | 0.2025 | 0.6572 | {'rouge1': 0.8286329956380034, 'rouge2': 0.7214521206108135, 'rougeL': 0.8280112126425977, 'rougeLsum': 0.8280818081170968} |
0.4069 | 1.0527 | 300 | 0.2683 | 0.1944 | 0.6691 | {'rouge1': 0.8333662643750593, 'rouge2': 0.7283520260829215, 'rougeL': 0.8329211797586267, 'rougeLsum': 0.8326890662231179} |
0.3738 | 1.4039 | 400 | 0.2500 | 0.1832 | 0.6885 | {'rouge1': 0.8475464903860954, 'rouge2': 0.7490364749206575, 'rougeL': 0.8470167743507384, 'rougeLsum': 0.8469648141276026} |
0.3393 | 1.7550 | 500 | 0.2473 | 0.1806 | 0.6872 | {'rouge1': 0.8487702037730754, 'rouge2': 0.7515019894679165, 'rougeL': 0.8482651202728038, 'rougeLsum': 0.8479933284128696} |
0.3337 | 2.1054 | 600 | 0.2341 | 0.1745 | 0.7015 | {'rouge1': 0.856524582926171, 'rouge2': 0.7615922551676065, 'rougeL': 0.8562090285236332, 'rougeLsum': 0.8560778549085171} |
0.3273 | 2.4565 | 700 | 0.2287 | 0.1776 | 0.6934 | {'rouge1': 0.8518960784546261, 'rouge2': 0.7571574767836958, 'rougeL': 0.8512601033375955, 'rougeLsum': 0.8510232422493931} |
0.3212 | 2.8077 | 800 | 0.2195 | 0.1673 | 0.7067 | {'rouge1': 0.8597006641882449, 'rouge2': 0.7675023646180051, 'rougeL': 0.8592257714231155, 'rougeLsum': 0.8593257522509605} |
0.2989 | 3.1580 | 900 | 0.2214 | 0.1633 | 0.7125 | {'rouge1': 0.8652652267566208, 'rouge2': 0.7751359730996655, 'rougeL': 0.8651950850137953, 'rougeLsum': 0.8648147312388779} |
0.2849 | 3.5092 | 1000 | 0.2176 | 0.1610 | 0.7119 | {'rouge1': 0.8674026972599356, 'rouge2': 0.7767934576217497, 'rougeL': 0.8670611483514326, 'rougeLsum': 0.866923394119187} |
0.321 | 3.8604 | 1100 | 0.2140 | 0.1562 | 0.7203 | {'rouge1': 0.8687120531054047, 'rouge2': 0.7801284964910911, 'rougeL': 0.8685687320059696, 'rougeLsum': 0.8686957534611408} |
0.2901 | 4.2107 | 1200 | 0.2092 | 0.1570 | 0.7233 | {'rouge1': 0.870783939334159, 'rouge2': 0.7846870067295553, 'rougeL': 0.8700614974709336, 'rougeLsum': 0.8703440992984535} |
0.2758 | 4.5619 | 1300 | 0.2208 | 0.1678 | 0.7044 | {'rouge1': 0.8627882593824365, 'rouge2': 0.7732336382659819, 'rougeL': 0.8623362277788245, 'rougeLsum': 0.862456577073438} |
0.2802 | 4.9131 | 1400 | 0.2039 | 0.1547 | 0.7258 | {'rouge1': 0.8731727674477189, 'rouge2': 0.7886178130374446, 'rougeL': 0.872672601389526, 'rougeLsum': 0.8726659641042169} |
0.2638 | 5.2634 | 1500 | 0.2043 | 0.1510 | 0.7335 | {'rouge1': 0.8755955637027819, 'rouge2': 0.7930493884188712, 'rougeL': 0.8751631654870777, 'rougeLsum': 0.8751907751029582} |
0.2752 | 5.6146 | 1600 | 0.2055 | 0.1551 | 0.7270 | {'rouge1': 0.872388381525685, 'rouge2': 0.7875275384987104, 'rougeL': 0.8719998038854011, 'rougeLsum': 0.8716380106368946} |
0.2611 | 5.9658 | 1700 | 0.2000 | 0.1470 | 0.7371 | {'rouge1': 0.8788848516546419, 'rouge2': 0.7961419908259184, 'rougeL': 0.8787077158049774, 'rougeLsum': 0.8785400491349351} |
0.2473 | 6.3161 | 1800 | 0.1964 | 0.1480 | 0.7367 | {'rouge1': 0.8780453998988988, 'rouge2': 0.7968768691849546, 'rougeL': 0.877539022180082, 'rougeLsum': 0.8772607424486614} |
0.2595 | 6.6673 | 1900 | 0.2025 | 0.1480 | 0.7381 | {'rouge1': 0.879639846099505, 'rouge2': 0.797600429803611, 'rougeL': 0.8793686789606971, 'rougeLsum': 0.8790549352654082} |
0.2689 | 7.0176 | 2000 | 0.1969 | 0.1432 | 0.7430 | {'rouge1': 0.881797326390697, 'rouge2': 0.8004647695765528, 'rougeL': 0.8813203554835087, 'rougeLsum': 0.8811828285656307} |
0.246 | 7.3687 | 2100 | 0.1963 | 0.1449 | 0.7398 | {'rouge1': 0.8817110807418125, 'rouge2': 0.8017781199834159, 'rougeL': 0.8815737656302565, 'rougeLsum': 0.8813654064210932} |
0.2502 | 7.7199 | 2200 | 0.1925 | 0.1492 | 0.7347 | {'rouge1': 0.8793953293462229, 'rouge2': 0.7995783364307951, 'rougeL': 0.8789946756811035, 'rougeLsum': 0.8792323972067102} |
0.2355 | 8.0702 | 2300 | 0.1912 | 0.1402 | 0.7460 | {'rouge1': 0.8848122766361803, 'rouge2': 0.805493594353921, 'rougeL': 0.8843513031942714, 'rougeLsum': 0.8847078352624693} |
0.2366 | 8.4214 | 2400 | 0.1885 | 0.1412 | 0.7426 | {'rouge1': 0.8840181215146308, 'rouge2': 0.8027935769840546, 'rougeL': 0.8836147367072817, 'rougeLsum': 0.8836426860532026} |
0.2407 | 8.7726 | 2500 | 0.1918 | 0.1447 | 0.7397 | {'rouge1': 0.8821736207296971, 'rouge2': 0.801567427041519, 'rougeL': 0.881521485738069, 'rougeLsum': 0.88187487496196} |
0.2387 | 9.1229 | 2600 | 0.1903 | 0.1334 | 0.7605 | {'rouge1': 0.8897429883784935, 'rouge2': 0.8137161586716353, 'rougeL': 0.8892440561394621, 'rougeLsum': 0.8891549602299966} |
0.2347 | 9.4741 | 2700 | 0.1834 | 0.1345 | 0.7572 | {'rouge1': 0.8881942208247308, 'rouge2': 0.8119581058011768, 'rougeL': 0.8879981514910154, 'rougeLsum': 0.887901612349636} |
0.2271 | 9.8253 | 2800 | 0.1858 | 0.1335 | 0.7590 | {'rouge1': 0.8911506791113586, 'rouge2': 0.8148539957097256, 'rougeL': 0.8905751461168572, 'rougeLsum': 0.890902060551137} |
0.2303 | 10.1756 | 2900 | 0.1858 | 0.1365 | 0.7501 | {'rouge1': 0.887481336474667, 'rouge2': 0.8085757416110948, 'rougeL': 0.887243114877972, 'rougeLsum': 0.8871195261439274} |
0.2284 | 10.5268 | 3000 | 0.1873 | 0.1348 | 0.7543 | {'rouge1': 0.8896704902968307, 'rouge2': 0.8119666309133653, 'rougeL': 0.8895050087347427, 'rougeLsum': 0.8893208588598065} |
0.2177 | 10.8780 | 3100 | 0.1899 | 0.1412 | 0.7429 | {'rouge1': 0.883841093950449, 'rouge2': 0.8027541651011862, 'rougeL': 0.8832016518265267, 'rougeLsum': 0.8831940672343328} |
0.2259 | 11.2283 | 3200 | 0.1841 | 0.1382 | 0.7505 | {'rouge1': 0.8861091671648993, 'rouge2': 0.8076445498910136, 'rougeL': 0.8860146387775966, 'rougeLsum': 0.8856781536323487} |
0.2183 | 11.5795 | 3300 | 0.1803 | 0.1333 | 0.7589 | {'rouge1': 0.8917208112217359, 'rouge2': 0.8180400270633807, 'rougeL': 0.8911859595950976, 'rougeLsum': 0.8913321406932759} |
0.2124 | 11.9306 | 3400 | 0.1826 | 0.1309 | 0.7626 | {'rouge1': 0.8924689635777846, 'rouge2': 0.8186471633950445, 'rougeL': 0.891927412819415, 'rougeLsum': 0.8920422701086449} |
0.1961 | 12.2809 | 3500 | 0.1824 | 0.1300 | 0.7648 | {'rouge1': 0.8947275911029863, 'rouge2': 0.8218776029324886, 'rougeL': 0.8942104883105186, 'rougeLsum': 0.894350829557474} |
0.2121 | 12.6321 | 3600 | 0.1792 | 0.1278 | 0.7649 | {'rouge1': 0.8965227623557459, 'rouge2': 0.8223749938722336, 'rougeL': 0.8961352103229281, 'rougeLsum': 0.8959098417776623} |
0.2087 | 12.9833 | 3700 | 0.1767 | 0.1294 | 0.7648 | {'rouge1': 0.8960785872160766, 'rouge2': 0.8226243103661531, 'rougeL': 0.8959515880736657, 'rougeLsum': 0.8959147567781259} |
0.1943 | 13.3336 | 3800 | 0.1801 | 0.1288 | 0.7644 | {'rouge1': 0.8947704603138941, 'rouge2': 0.8196022891096679, 'rougeL': 0.8945419678439788, 'rougeLsum': 0.8941955883885566} |
0.2053 | 13.6848 | 3900 | 0.1732 | 0.1269 | 0.7682 | {'rouge1': 0.8953389907409508, 'rouge2': 0.8204708771662934, 'rougeL': 0.8942418907803242, 'rougeLsum': 0.8943319307650137} |
0.2196 | 14.0351 | 4000 | 0.1722 | 0.1258 | 0.7704 | {'rouge1': 0.8971096393899529, 'rouge2': 0.8245173641233066, 'rougeL': 0.8968754492659268, 'rougeLsum': 0.8967885844102701} |
0.1996 | 14.3863 | 4100 | 0.1746 | 0.1283 | 0.7663 | {'rouge1': 0.8992149614052899, 'rouge2': 0.8283870278179525, 'rougeL': 0.8990034855026199, 'rougeLsum': 0.899113705236827} |
0.2028 | 14.7375 | 4200 | 0.1723 | 0.1258 | 0.7688 | {'rouge1': 0.8981221304907357, 'rouge2': 0.8268085111954614, 'rougeL': 0.8978730997154603, 'rougeLsum': 0.8978255561942574} |
0.1784 | 15.0878 | 4300 | 0.1741 | 0.1210 | 0.7777 | {'rouge1': 0.9022989643269592, 'rouge2': 0.8321097618769113, 'rougeL': 0.9019679668540621, 'rougeLsum': 0.9020840316410879} |
0.1954 | 15.4390 | 4400 | 0.1746 | 0.1234 | 0.7748 | {'rouge1': 0.8990459493090655, 'rouge2': 0.8287806337458845, 'rougeL': 0.8990074200510402, 'rougeLsum': 0.8988288738757491} |
0.1916 | 15.7902 | 4500 | 0.1719 | 0.1230 | 0.7761 | {'rouge1': 0.900488872562492, 'rouge2': 0.8307065830708865, 'rougeL': 0.9000372599293843, 'rougeLsum': 0.9002263530831652} |
0.1883 | 16.1405 | 4600 | 0.1712 | 0.1226 | 0.7757 | {'rouge1': 0.9019026814628661, 'rouge2': 0.8329976152495208, 'rougeL': 0.9016592566182122, 'rougeLsum': 0.9016282070055117} |
0.1832 | 16.4917 | 4700 | 0.1713 | 0.1248 | 0.7733 | {'rouge1': 0.8995223210908226, 'rouge2': 0.8290222714943427, 'rougeL': 0.8994032458040973, 'rougeLsum': 0.899366836054343} |
0.1888 | 16.8428 | 4800 | 0.1698 | 0.1264 | 0.7721 | {'rouge1': 0.8982535964067325, 'rouge2': 0.8288396477969829, 'rougeL': 0.8977273104751539, 'rougeLsum': 0.8976965343038692} |
0.1857 | 17.1932 | 4900 | 0.1718 | 0.1230 | 0.7757 | {'rouge1': 0.9026836932266615, 'rouge2': 0.833487216136216, 'rougeL': 0.9022265966641445, 'rougeLsum': 0.9024265508303717} |
0.1858 | 17.5443 | 5000 | 0.1705 | 0.1204 | 0.7792 | {'rouge1': 0.9046938392928605, 'rouge2': 0.8378406365404705, 'rougeL': 0.904281513678646, 'rougeLsum': 0.9040395290033556} |
0.1838 | 17.8955 | 5100 | 0.1713 | 0.1222 | 0.7773 | {'rouge1': 0.9025388171823945, 'rouge2': 0.8339417592886358, 'rougeL': 0.901982659834949, 'rougeLsum': 0.902191073170105} |
0.1784 | 18.2458 | 5200 | 0.1710 | 0.1228 | 0.7741 | {'rouge1': 0.9027135961066619, 'rouge2': 0.8338803735375095, 'rougeL': 0.9021313465595333, 'rougeLsum': 0.9022241778623963} |
0.1748 | 18.5970 | 5300 | 0.1700 | 0.1205 | 0.7803 | {'rouge1': 0.9028943462107296, 'rouge2': 0.8360887354751103, 'rougeL': 0.9028049282097476, 'rougeLsum': 0.9026754818313738} |
0.1785 | 18.9482 | 5400 | 0.1683 | 0.1191 | 0.7827 | {'rouge1': 0.9058615541527929, 'rouge2': 0.8397686128782502, 'rougeL': 0.905380338988193, 'rougeLsum': 0.9053818296865667} |
0.1715 | 19.2985 | 5500 | 0.1693 | 0.1197 | 0.7813 | {'rouge1': 0.9042951746231659, 'rouge2': 0.836346216169132, 'rougeL': 0.9038200107993153, 'rougeLsum': 0.9038172163450839} |
0.1743 | 19.6497 | 5600 | 0.1656 | 0.1198 | 0.7820 | {'rouge1': 0.9056265375031829, 'rouge2': 0.8389744056310899, 'rougeL': 0.9052424864181929, 'rougeLsum': 0.9052107898473516} |
0.179 | 20.0 | 5700 | 0.1662 | 0.1200 | 0.7813 | {'rouge1': 0.9049957724375504, 'rouge2': 0.838664745594604, 'rougeL': 0.9047378933290081, 'rougeLsum': 0.9047796552290915} |
0.1705 | 20.3512 | 5800 | 0.1671 | 0.1158 | 0.7875 | {'rouge1': 0.9066988025993499, 'rouge2': 0.84144312190399, 'rougeL': 0.9066516406825784, 'rougeLsum': 0.9064192832523982} |
0.1737 | 20.7024 | 5900 | 0.1668 | 0.1191 | 0.7809 | {'rouge1': 0.9044882408830994, 'rouge2': 0.8379831752266652, 'rougeL': 0.9044191300138034, 'rougeLsum': 0.9041602420355832} |
0.161 | 21.0527 | 6000 | 0.1675 | 0.1176 | 0.7855 | {'rouge1': 0.9057981628991186, 'rouge2': 0.8402890351927816, 'rougeL': 0.9055204210299842, 'rougeLsum': 0.9055949937515819} |
0.1634 | 21.4039 | 6100 | 0.1656 | 0.1172 | 0.7849 | {'rouge1': 0.9052591361441602, 'rouge2': 0.8390947395192179, 'rougeL': 0.904901443457665, 'rougeLsum': 0.9049274938729717} |
0.1717 | 21.7550 | 6200 | 0.1655 | 0.1184 | 0.7850 | {'rouge1': 0.9056855411229652, 'rouge2': 0.8391993079429194, 'rougeL': 0.9051256484322957, 'rougeLsum': 0.9054924942388913} |
0.1532 | 22.1054 | 6300 | 0.1640 | 0.1138 | 0.7895 | {'rouge1': 0.908514689307546, 'rouge2': 0.8432231540949471, 'rougeL': 0.9081112639646672, 'rougeLsum': 0.908272589564951} |
0.167 | 22.4565 | 6400 | 0.1626 | 0.1140 | 0.7921 | {'rouge1': 0.9103164260947302, 'rouge2': 0.847605808196412, 'rougeL': 0.9098933635969182, 'rougeLsum': 0.9098438675661122} |
0.1606 | 22.8077 | 6500 | 0.1632 | 0.1115 | 0.7956 | {'rouge1': 0.9112987341936316, 'rouge2': 0.8492302659397843, 'rougeL': 0.9111470792331513, 'rougeLsum': 0.9109445157457727} |
0.1599 | 23.1580 | 6600 | 0.1642 | 0.1108 | 0.7960 | {'rouge1': 0.9118713013856634, 'rouge2': 0.848580538085876, 'rougeL': 0.9114618012803377, 'rougeLsum': 0.9116046463908325} |
0.156 | 23.5092 | 6700 | 0.1634 | 0.1122 | 0.7950 | {'rouge1': 0.9106394674615637, 'rouge2': 0.8474609350248357, 'rougeL': 0.910065118851396, 'rougeLsum': 0.9100662823980902} |
0.1589 | 23.8604 | 6800 | 0.1625 | 0.1130 | 0.7916 | {'rouge1': 0.9103350470563911, 'rouge2': 0.8474738385027315, 'rougeL': 0.9099995608863516, 'rougeLsum': 0.9100397914660747} |
0.1622 | 24.2107 | 6900 | 0.1626 | 0.1117 | 0.7943 | {'rouge1': 0.9133839350790938, 'rouge2': 0.8511081438025545, 'rougeL': 0.9126575984413424, 'rougeLsum': 0.9126505650621592} |
0.1521 | 24.5619 | 7000 | 0.1618 | 0.1109 | 0.7963 | {'rouge1': 0.912211680613469, 'rouge2': 0.8496181639239891, 'rougeL': 0.9117487343663472, 'rougeLsum': 0.9117499155750092} |
0.1503 | 24.9131 | 7100 | 0.1612 | 0.1115 | 0.7945 | {'rouge1': 0.9119319650927245, 'rouge2': 0.8489304675858942, 'rougeL': 0.9115897952726388, 'rougeLsum': 0.9116405544813904} |
0.1504 | 25.2634 | 7200 | 0.1621 | 0.1103 | 0.7957 | {'rouge1': 0.9121880227389143, 'rouge2': 0.8492463401850738, 'rougeL': 0.9118447435959087, 'rougeLsum': 0.9118033384400608} |
0.1519 | 25.6146 | 7300 | 0.1615 | 0.1118 | 0.7931 | {'rouge1': 0.9112498683244998, 'rouge2': 0.8480832804563686, 'rougeL': 0.9107105272229861, 'rougeLsum': 0.9107937803611704} |
0.1479 | 25.9658 | 7400 | 0.1611 | 0.1098 | 0.7974 | {'rouge1': 0.9136242054251107, 'rouge2': 0.8509392563862166, 'rougeL': 0.9130826085424906, 'rougeLsum': 0.9132200366521013} |
0.1437 | 26.3161 | 7500 | 0.1609 | 0.1099 | 0.7958 | {'rouge1': 0.9130313007924953, 'rouge2': 0.8502201629741937, 'rougeL': 0.9126113638303235, 'rougeLsum': 0.9126553187114919} |
0.148 | 26.6673 | 7600 | 0.1609 | 0.1095 | 0.7968 | {'rouge1': 0.9137297115874194, 'rouge2': 0.8504308144873894, 'rougeL': 0.9133787545300738, 'rougeLsum': 0.9133679379590172} |
0.1568 | 27.0176 | 7700 | 0.1607 | 0.1107 | 0.7945 | {'rouge1': 0.9131462089688931, 'rouge2': 0.8499313234434469, 'rougeL': 0.912728356326638, 'rougeLsum': 0.9129359515643884} |
0.1415 | 27.3687 | 7800 | 0.1607 | 0.1089 | 0.7971 | {'rouge1': 0.9137517387960785, 'rouge2': 0.8501495668327003, 'rougeL': 0.913338099417192, 'rougeLsum': 0.9136064287202272} |
0.1496 | 27.7199 | 7900 | 0.1597 | 0.1095 | 0.7966 | {'rouge1': 0.9130600978677069, 'rouge2': 0.8504810140358678, 'rougeL': 0.9127735778942226, 'rougeLsum': 0.9125405596281091} |
0.1492 | 28.0702 | 8000 | 0.1596 | 0.1103 | 0.7955 | {'rouge1': 0.9120508555985649, 'rouge2': 0.8487431673284105, 'rougeL': 0.9118153720293241, 'rougeLsum': 0.9116675598551898} |
0.1433 | 28.4214 | 8100 | 0.1600 | 0.1097 | 0.7967 | {'rouge1': 0.912271157742592, 'rouge2': 0.8491645313481614, 'rougeL': 0.9119383068792002, 'rougeLsum': 0.9120338521552568} |
0.1439 | 28.7726 | 8200 | 0.1591 | 0.1080 | 0.7995 | {'rouge1': 0.9144489684474263, 'rouge2': 0.8511979198969957, 'rougeL': 0.9137012695418792, 'rougeLsum': 0.9139716479376014} |
0.1325 | 29.1229 | 8300 | 0.1598 | 0.1085 | 0.7984 | {'rouge1': 0.9139270928380065, 'rouge2': 0.8507649128122179, 'rougeL': 0.9135264918778438, 'rougeLsum': 0.9135318897680185} |
0.1416 | 29.4741 | 8400 | 0.1595 | 0.1077 | 0.7997 | {'rouge1': 0.9146277490504104, 'rouge2': 0.8513032425483812, 'rougeL': 0.9140852902087768, 'rougeLsum': 0.9140094297264637} |
0.1451 | 29.8253 | 8500 | 0.1592 | 0.1081 | 0.7992 | {'rouge1': 0.9142461267493629, 'rouge2': 0.8512223456977356, 'rougeL': 0.9140461108455781, 'rougeLsum': 0.9139759112519872} |
Framework versions
- Transformers 4.49.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- 11
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for ilyes25/frdirect
Base model
facebook/mms-1b-all