modernbert-llm-router
This model is a fine-tuned version of answerdotai/ModernBERT-large on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.1570
- F1: 0.6482
- Macro F1: 0.6482
- Precision: 0.6799
- Cross Entropy: 0.8785
- Min Class Accuracy: 0.528
- Confusion Matrix: [[888, 109, 3], [383, 528, 89], [134, 320, 546]]
- Accuracy Class 0: 0.888
- Accuracy Class 1: 0.528
- Accuracy Class 2: 0.546
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-07
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 64
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | F1 | Macro F1 | Precision | Cross Entropy | Min Class Accuracy | Confusion Matrix | Accuracy Class 0 | Accuracy Class 1 | Accuracy Class 2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
3.6672 | 0.0469 | 2000 | 0.2749 | 0.1994 | 0.1994 | 0.6059 | 1.1097 | 0.003 | [[973, 27, 0], [949, 51, 0], [969, 28, 3]] | 0.973 | 0.051 | 0.003 |
3.6053 | 0.0587 | 2500 | 0.2721 | 0.1932 | 0.1932 | 0.4968 | 1.1049 | 0.004 | [[981, 19, 0], [960, 38, 2], [975, 21, 4]] | 0.981 | 0.038 | 0.004 |
3.5157 | 0.0704 | 3000 | 0.2677 | 0.2129 | 0.2129 | 0.5271 | 1.0985 | 0.016 | [[976, 24, 0], [937, 57, 6], [954, 30, 16]] | 0.976 | 0.057 | 0.016 |
3.4239 | 0.0822 | 3500 | 0.2611 | 0.2567 | 0.2567 | 0.5329 | 1.0879 | 0.061 | [[968, 27, 5], [914, 61, 25], [888, 30, 82]] | 0.968 | 0.061 | 0.082 |
3.3098 | 0.0939 | 4000 | 0.2733 | 0.2384 | 0.2384 | 0.5103 | 1.0886 | 0.05 | [[969, 28, 3], [920, 63, 17], [907, 43, 50]] | 0.969 | 0.063 | 0.05 |
3.2217 | 0.1056 | 4500 | 0.2723 | 0.2861 | 0.2861 | 0.5349 | 1.0766 | 0.064 | [[971, 23, 6], [896, 64, 40], [832, 39, 129]] | 0.971 | 0.064 | 0.129 |
2.958 | 0.1174 | 5000 | 0.2598 | 0.3678 | 0.3678 | 0.5286 | 1.0522 | 0.103 | [[956, 31, 13], [829, 103, 68], [644, 103, 253]] | 0.956 | 0.103 | 0.253 |
2.8687 | 0.1291 | 5500 | 0.2531 | 0.4079 | 0.4079 | 0.5194 | 1.0371 | 0.165 | [[936, 46, 18], [748, 165, 87], [524, 195, 281]] | 0.936 | 0.165 | 0.281 |
2.783 | 0.1408 | 6000 | 0.2292 | 0.4652 | 0.4652 | 0.5485 | 1.0109 | 0.217 | [[930, 53, 17], [672, 217, 111], [419, 217, 364]] | 0.93 | 0.217 | 0.364 |
2.6948 | 0.1526 | 6500 | 0.2483 | 0.4562 | 0.4562 | 0.5496 | 1.0164 | 0.251 | [[935, 51, 14], [666, 251, 83], [413, 285, 302]] | 0.935 | 0.251 | 0.302 |
2.6202 | 0.1643 | 7000 | 0.2178 | 0.5113 | 0.5113 | 0.5703 | 0.9875 | 0.323 | [[914, 70, 16], [572, 323, 105], [279, 346, 375]] | 0.914 | 0.323 | 0.375 |
2.4887 | 0.1760 | 7500 | 0.2243 | 0.5045 | 0.5045 | 0.5680 | 0.9850 | 0.298 | [[924, 62, 14], [598, 298, 104], [297, 326, 377]] | 0.924 | 0.298 | 0.377 |
2.4182 | 0.1878 | 8000 | 0.2019 | 0.5590 | 0.5590 | 0.5955 | 0.9603 | 0.35 | [[902, 76, 22], [506, 350, 144], [235, 283, 482]] | 0.902 | 0.35 | 0.482 |
2.3916 | 0.1995 | 8500 | 0.1987 | 0.5744 | 0.5744 | 0.6087 | 0.9520 | 0.385 | [[898, 79, 23], [474, 385, 141], [230, 281, 489]] | 0.898 | 0.385 | 0.489 |
2.309 | 0.2113 | 9000 | 0.2089 | 0.5648 | 0.5648 | 0.6127 | 0.9659 | 0.417 | [[895, 95, 10], [463, 431, 106], [238, 345, 417]] | 0.895 | 0.431 | 0.417 |
2.3453 | 0.2230 | 9500 | 0.2057 | 0.5755 | 0.5755 | 0.6232 | 0.9630 | 0.424 | [[898, 93, 9], [451, 450, 99], [218, 358, 424]] | 0.898 | 0.45 | 0.424 |
2.2393 | 0.2347 | 10000 | 0.2036 | 0.5701 | 0.5701 | 0.6159 | 0.9600 | 0.413 | [[902, 88, 10], [479, 413, 108], [228, 329, 443]] | 0.902 | 0.413 | 0.443 |
2.2557 | 0.2465 | 10500 | 0.1972 | 0.5764 | 0.5764 | 0.6219 | 0.9539 | 0.422 | [[886, 103, 11], [438, 463, 99], [205, 373, 422]] | 0.886 | 0.463 | 0.422 |
2.1826 | 0.2582 | 11000 | 0.2084 | 0.5463 | 0.5463 | 0.6094 | 0.9636 | 0.386 | [[924, 67, 9], [521, 394, 85], [255, 359, 386]] | 0.924 | 0.394 | 0.386 |
2.1356 | 0.2699 | 11500 | 0.2013 | 0.5804 | 0.5804 | 0.6278 | 0.9521 | 0.435 | [[910, 80, 10], [459, 443, 98], [216, 349, 435]] | 0.91 | 0.443 | 0.435 |
2.1075 | 0.2817 | 12000 | 0.1884 | 0.5852 | 0.5852 | 0.6247 | 0.9432 | 0.439 | [[875, 116, 9], [414, 477, 109], [187, 374, 439]] | 0.875 | 0.477 | 0.439 |
2.0963 | 0.2934 | 12500 | 0.1991 | 0.5773 | 0.5773 | 0.6309 | 0.9483 | 0.414 | [[900, 91, 9], [456, 463, 81], [202, 384, 414]] | 0.9 | 0.463 | 0.414 |
2.0256 | 0.3051 | 13000 | 0.2012 | 0.5747 | 0.5747 | 0.6262 | 0.9461 | 0.425 | [[900, 91, 9], [468, 444, 88], [214, 361, 425]] | 0.9 | 0.444 | 0.425 |
2.0341 | 0.3169 | 13500 | 0.2036 | 0.5835 | 0.5835 | 0.6330 | 0.9475 | 0.438 | [[896, 95, 9], [456, 457, 87], [208, 354, 438]] | 0.896 | 0.457 | 0.438 |
2.0004 | 0.3286 | 14000 | 0.1988 | 0.5831 | 0.5831 | 0.6372 | 0.9422 | 0.431 | [[905, 88, 7], [466, 456, 78], [205, 364, 431]] | 0.905 | 0.456 | 0.431 |
1.9904 | 0.3404 | 14500 | 0.1981 | 0.5813 | 0.5813 | 0.6331 | 0.9374 | 0.426 | [[905, 88, 7], [459, 457, 84], [197, 377, 426]] | 0.905 | 0.457 | 0.426 |
1.9248 | 0.3521 | 15000 | 0.1974 | 0.5886 | 0.5886 | 0.6300 | 0.9385 | 0.447 | [[870, 120, 10], [419, 481, 100], [192, 361, 447]] | 0.87 | 0.481 | 0.447 |
1.9778 | 0.3638 | 15500 | 0.1974 | 0.5922 | 0.5922 | 0.6400 | 0.9378 | 0.44 | [[898, 93, 9], [439, 477, 84], [184, 376, 440]] | 0.898 | 0.477 | 0.44 |
1.8983 | 0.3756 | 16000 | 0.1901 | 0.6029 | 0.6029 | 0.6479 | 0.9304 | 0.443 | [[877, 115, 8], [397, 519, 84], [168, 389, 443]] | 0.877 | 0.519 | 0.443 |
1.8844 | 0.3873 | 16500 | 0.1936 | 0.5919 | 0.5919 | 0.6362 | 0.9258 | 0.444 | [[906, 86, 8], [441, 466, 93], [170, 386, 444]] | 0.906 | 0.466 | 0.444 |
1.9149 | 0.3990 | 17000 | 0.1840 | 0.6005 | 0.6005 | 0.6352 | 0.9269 | 0.431 | [[899, 91, 10], [451, 431, 118], [190, 302, 508]] | 0.899 | 0.431 | 0.508 |
1.8763 | 0.4108 | 17500 | 0.1856 | 0.5904 | 0.5904 | 0.6319 | 0.9256 | 0.436 | [[901, 91, 8], [466, 436, 98], [171, 357, 472]] | 0.901 | 0.436 | 0.472 |
1.8572 | 0.4225 | 18000 | 0.1936 | 0.5691 | 0.5691 | 0.6309 | 0.9335 | 0.412 | [[923, 72, 5], [504, 425, 71], [205, 383, 412]] | 0.923 | 0.425 | 0.412 |
1.8527 | 0.4342 | 18500 | 0.1868 | 0.5985 | 0.5985 | 0.6478 | 0.9266 | 0.436 | [[895, 98, 7], [423, 500, 77], [163, 401, 436]] | 0.895 | 0.5 | 0.436 |
1.8157 | 0.4460 | 19000 | 0.2001 | 0.5833 | 0.5833 | 0.6401 | 0.9345 | 0.438 | [[914, 81, 5], [492, 438, 70], [189, 370, 441]] | 0.914 | 0.438 | 0.441 |
1.8342 | 0.4577 | 19500 | 0.1848 | 0.5952 | 0.5952 | 0.6372 | 0.9178 | 0.416 | [[914, 79, 7], [486, 416, 98], [180, 324, 496]] | 0.914 | 0.416 | 0.496 |
1.8161 | 0.4695 | 20000 | 0.1834 | 0.5976 | 0.5976 | 0.6429 | 0.9200 | 0.428 | [[913, 81, 6], [484, 428, 88], [182, 328, 490]] | 0.913 | 0.428 | 0.49 |
1.8287 | 0.4812 | 20500 | 0.1896 | 0.6024 | 0.6024 | 0.6546 | 0.9252 | 0.458 | [[900, 97, 3], [449, 482, 69], [172, 370, 458]] | 0.9 | 0.482 | 0.458 |
1.7956 | 0.4929 | 21000 | 0.1739 | 0.6133 | 0.6133 | 0.6484 | 0.9062 | 0.455 | [[900, 93, 7], [441, 455, 104], [164, 320, 516]] | 0.9 | 0.455 | 0.516 |
1.7941 | 0.5047 | 21500 | 0.1871 | 0.5908 | 0.5908 | 0.6490 | 0.9208 | 0.418 | [[898, 98, 4], [441, 495, 64], [163, 419, 418]] | 0.898 | 0.495 | 0.418 |
1.7451 | 0.5164 | 22000 | 0.1841 | 0.6200 | 0.6200 | 0.6564 | 0.9181 | 0.497 | [[879, 115, 6], [400, 507, 93], [152, 351, 497]] | 0.879 | 0.507 | 0.497 |
1.7314 | 0.5281 | 22500 | 0.1921 | 0.5787 | 0.5787 | 0.6449 | 0.9252 | 0.401 | [[911, 86, 3], [474, 470, 56], [170, 429, 401]] | 0.911 | 0.47 | 0.401 |
1.7192 | 0.5399 | 23000 | 0.1820 | 0.6059 | 0.6059 | 0.6456 | 0.9105 | 0.453 | [[899, 94, 7], [456, 453, 91], [155, 348, 497]] | 0.899 | 0.453 | 0.497 |
1.7412 | 0.5516 | 23500 | 0.1803 | 0.6137 | 0.6137 | 0.6492 | 0.9130 | 0.455 | [[902, 91, 7], [445, 455, 100], [153, 332, 515]] | 0.902 | 0.455 | 0.515 |
1.7121 | 0.5633 | 24000 | 0.1743 | 0.6074 | 0.6074 | 0.6439 | 0.9082 | 0.43 | [[908, 85, 7], [463, 430, 107], [174, 306, 520]] | 0.908 | 0.43 | 0.52 |
1.7407 | 0.5751 | 24500 | 0.1718 | 0.6060 | 0.6060 | 0.6464 | 0.9057 | 0.447 | [[908, 86, 6], [462, 447, 91], [153, 350, 497]] | 0.908 | 0.447 | 0.497 |
1.689 | 0.5868 | 25000 | 0.1870 | 0.5823 | 0.5823 | 0.6467 | 0.9149 | 0.424 | [[929, 68, 3], [500, 442, 58], [183, 393, 424]] | 0.929 | 0.442 | 0.424 |
1.6961 | 0.5986 | 25500 | 0.1737 | 0.6068 | 0.6068 | 0.6472 | 0.9072 | 0.435 | [[916, 80, 4], [469, 435, 96], [166, 328, 506]] | 0.916 | 0.435 | 0.506 |
1.689 | 0.6103 | 26000 | 0.1758 | 0.6225 | 0.6225 | 0.6596 | 0.9050 | 0.497 | [[884, 111, 5], [415, 497, 88], [145, 346, 509]] | 0.884 | 0.497 | 0.509 |
1.694 | 0.6220 | 26500 | 0.1747 | 0.6089 | 0.6089 | 0.6476 | 0.9020 | 0.445 | [[901, 94, 5], [460, 445, 95], [163, 325, 512]] | 0.901 | 0.445 | 0.512 |
1.6739 | 0.6338 | 27000 | 0.1800 | 0.5987 | 0.5987 | 0.6543 | 0.9131 | 0.438 | [[903, 94, 3], [444, 491, 65], [165, 397, 438]] | 0.903 | 0.491 | 0.438 |
1.7086 | 0.6455 | 27500 | 0.1764 | 0.6079 | 0.6079 | 0.6563 | 0.9035 | 0.468 | [[911, 83, 6], [461, 468, 71], [163, 359, 478]] | 0.911 | 0.468 | 0.478 |
1.644 | 0.6572 | 28000 | 0.1806 | 0.6132 | 0.6132 | 0.6596 | 0.9075 | 0.469 | [[911, 85, 4], [457, 469, 74], [162, 347, 491]] | 0.911 | 0.469 | 0.491 |
1.7015 | 0.6690 | 28500 | 0.1729 | 0.6162 | 0.6162 | 0.6587 | 0.9007 | 0.445 | [[921, 75, 4], [469, 445, 86], [169, 314, 517]] | 0.921 | 0.445 | 0.517 |
1.6286 | 0.6807 | 29000 | 0.1721 | 0.6223 | 0.6223 | 0.6665 | 0.9002 | 0.49 | [[887, 109, 4], [415, 513, 72], [153, 357, 490]] | 0.887 | 0.513 | 0.49 |
1.6681 | 0.6924 | 29500 | 0.1896 | 0.5888 | 0.5888 | 0.6560 | 0.9180 | 0.415 | [[928, 69, 3], [480, 470, 50], [174, 411, 415]] | 0.928 | 0.47 | 0.415 |
1.6712 | 0.7042 | 30000 | 0.1756 | 0.6084 | 0.6084 | 0.6606 | 0.9035 | 0.452 | [[923, 73, 4], [482, 452, 66], [177, 337, 486]] | 0.923 | 0.452 | 0.486 |
1.6284 | 0.7159 | 30500 | 0.1732 | 0.6029 | 0.6029 | 0.6616 | 0.9055 | 0.46 | [[926, 71, 3], [484, 461, 55], [168, 372, 460]] | 0.926 | 0.461 | 0.46 |
1.6946 | 0.7277 | 31000 | 0.1732 | 0.6146 | 0.6146 | 0.6668 | 0.9055 | 0.465 | [[894, 103, 3], [428, 512, 60], [155, 380, 465]] | 0.894 | 0.512 | 0.465 |
1.6518 | 0.7394 | 31500 | 0.1817 | 0.6034 | 0.6034 | 0.6599 | 0.9063 | 0.45 | [[931, 65, 4], [490, 450, 60], [174, 357, 469]] | 0.931 | 0.45 | 0.469 |
1.6356 | 0.7511 | 32000 | 0.1748 | 0.6009 | 0.6009 | 0.6582 | 0.8966 | 0.449 | [[930, 67, 3], [491, 449, 60], [172, 364, 464]] | 0.93 | 0.449 | 0.464 |
1.6486 | 0.7629 | 32500 | 0.1773 | 0.6035 | 0.6035 | 0.6604 | 0.9056 | 0.444 | [[910, 87, 3], [449, 492, 59], [160, 396, 444]] | 0.91 | 0.492 | 0.444 |
1.5899 | 0.7746 | 33000 | 0.1729 | 0.6081 | 0.6081 | 0.6557 | 0.8997 | 0.439 | [[924, 72, 4], [486, 439, 75], [164, 338, 498]] | 0.924 | 0.439 | 0.498 |
1.6076 | 0.7863 | 33500 | 0.1833 | 0.5979 | 0.5979 | 0.6564 | 0.9073 | 0.434 | [[939, 59, 2], [503, 434, 63], [184, 351, 465]] | 0.939 | 0.434 | 0.465 |
1.5796 | 0.7981 | 34000 | 0.1682 | 0.6246 | 0.6246 | 0.6703 | 0.8937 | 0.484 | [[895, 101, 4], [411, 520, 69], [154, 362, 484]] | 0.895 | 0.52 | 0.484 |
1.5936 | 0.8098 | 34500 | 0.1580 | 0.6212 | 0.6212 | 0.6628 | 0.8843 | 0.458 | [[918, 78, 4], [460, 458, 82], [157, 324, 519]] | 0.918 | 0.458 | 0.519 |
1.6008 | 0.8215 | 35000 | 0.1764 | 0.5922 | 0.5922 | 0.6540 | 0.9006 | 0.406 | [[953, 44, 3], [534, 406, 60], [189, 343, 468]] | 0.953 | 0.406 | 0.468 |
1.5861 | 0.8333 | 35500 | 0.1611 | 0.6022 | 0.6022 | 0.6586 | 0.8958 | 0.438 | [[903, 94, 3], [440, 500, 60], [148, 414, 438]] | 0.903 | 0.5 | 0.438 |
1.586 | 0.8450 | 36000 | 0.1632 | 0.6213 | 0.6213 | 0.6582 | 0.8864 | 0.435 | [[917, 79, 4], [469, 435, 96], [156, 299, 545]] | 0.917 | 0.435 | 0.545 |
1.5631 | 0.8568 | 36500 | 0.1719 | 0.6040 | 0.6040 | 0.6629 | 0.8979 | 0.449 | [[918, 79, 3], [463, 482, 55], [168, 383, 449]] | 0.918 | 0.482 | 0.449 |
1.584 | 0.8685 | 37000 | 0.1675 | 0.6097 | 0.6097 | 0.6628 | 0.8934 | 0.453 | [[929, 68, 3], [484, 453, 63], [166, 350, 484]] | 0.929 | 0.453 | 0.484 |
1.5881 | 0.8802 | 37500 | 0.1845 | 0.5910 | 0.5910 | 0.6545 | 0.9042 | 0.412 | [[949, 49, 2], [532, 412, 56], [185, 354, 461]] | 0.949 | 0.412 | 0.461 |
1.5851 | 0.8920 | 38000 | 0.1627 | 0.6334 | 0.6334 | 0.6753 | 0.8916 | 0.508 | [[903, 94, 3], [419, 508, 73], [153, 334, 513]] | 0.903 | 0.508 | 0.513 |
1.547 | 0.9037 | 38500 | 0.1644 | 0.6178 | 0.6178 | 0.6572 | 0.8845 | 0.427 | [[923, 73, 4], [481, 427, 92], [158, 303, 539]] | 0.923 | 0.427 | 0.539 |
1.5428 | 0.9154 | 39000 | 0.1652 | 0.6182 | 0.6182 | 0.6636 | 0.8880 | 0.451 | [[927, 70, 3], [473, 451, 76], [159, 330, 511]] | 0.927 | 0.451 | 0.511 |
1.5543 | 0.9272 | 39500 | 0.1594 | 0.6354 | 0.6354 | 0.6691 | 0.8830 | 0.491 | [[903, 92, 5], [415, 491, 94], [149, 314, 537]] | 0.903 | 0.491 | 0.537 |
1.5471 | 0.9389 | 40000 | 0.1655 | 0.6210 | 0.6210 | 0.6607 | 0.8808 | 0.442 | [[927, 68, 5], [470, 442, 88], [152, 319, 529]] | 0.927 | 0.442 | 0.529 |
1.5573 | 0.9506 | 40500 | 0.1665 | 0.6084 | 0.6084 | 0.6592 | 0.8944 | 0.442 | [[934, 63, 3], [490, 442, 68], [157, 355, 488]] | 0.934 | 0.442 | 0.488 |
1.5905 | 0.9624 | 41000 | 0.1610 | 0.6131 | 0.6131 | 0.6656 | 0.8919 | 0.45 | [[914, 83, 3], [426, 510, 64], [145, 405, 450]] | 0.914 | 0.51 | 0.45 |
1.5487 | 0.9741 | 41500 | 0.1633 | 0.6251 | 0.6251 | 0.6699 | 0.8882 | 0.476 | [[918, 79, 3], [452, 476, 72], [157, 332, 511]] | 0.918 | 0.476 | 0.511 |
1.5515 | 0.9859 | 42000 | 0.1629 | 0.6350 | 0.6350 | 0.6733 | 0.8829 | 0.51 | [[891, 106, 3], [395, 525, 80], [139, 351, 510]] | 0.891 | 0.525 | 0.51 |
1.5963 | 0.9976 | 42500 | 0.1638 | 0.6246 | 0.6246 | 0.6727 | 0.8935 | 0.47 | [[894, 103, 3], [398, 536, 66], [153, 377, 470]] | 0.894 | 0.536 | 0.47 |
1.4987 | 1.0093 | 43000 | 0.1754 | 0.5979 | 0.5979 | 0.6558 | 0.8947 | 0.424 | [[942, 56, 2], [516, 424, 60], [166, 362, 472]] | 0.942 | 0.424 | 0.472 |
1.5302 | 1.0211 | 43500 | 0.1590 | 0.6342 | 0.6342 | 0.6730 | 0.8836 | 0.506 | [[898, 99, 3], [415, 506, 79], [146, 333, 521]] | 0.898 | 0.506 | 0.521 |
1.4878 | 1.0328 | 44000 | 0.1686 | 0.6260 | 0.6260 | 0.6702 | 0.8946 | 0.495 | [[902, 95, 3], [434, 495, 71], [158, 336, 506]] | 0.902 | 0.495 | 0.506 |
1.4577 | 1.0446 | 44500 | 0.1789 | 0.6039 | 0.6039 | 0.6642 | 0.9034 | 0.442 | [[918, 80, 2], [458, 489, 53], [159, 399, 442]] | 0.918 | 0.489 | 0.442 |
1.4834 | 1.0563 | 45000 | 0.1573 | 0.6372 | 0.6372 | 0.6730 | 0.8758 | 0.499 | [[902, 95, 3], [414, 499, 87], [148, 318, 534]] | 0.902 | 0.499 | 0.534 |
1.493 | 1.0680 | 45500 | 0.1495 | 0.6302 | 0.6302 | 0.6664 | 0.8767 | 0.455 | [[920, 77, 3], [453, 455, 92], [146, 308, 546]] | 0.92 | 0.455 | 0.546 |
1.4802 | 1.0798 | 46000 | 0.1570 | 0.6482 | 0.6482 | 0.6799 | 0.8785 | 0.528 | [[888, 109, 3], [383, 528, 89], [134, 320, 546]] | 0.888 | 0.528 | 0.546 |
1.4442 | 1.0915 | 46500 | 0.1624 | 0.6376 | 0.6376 | 0.6743 | 0.8885 | 0.514 | [[893, 105, 2], [404, 514, 82], [138, 336, 526]] | 0.893 | 0.514 | 0.526 |
1.4739 | 1.1032 | 47000 | 0.1647 | 0.6225 | 0.6225 | 0.6680 | 0.8885 | 0.49 | [[914, 83, 3], [439, 490, 71], [146, 361, 493]] | 0.914 | 0.49 | 0.493 |
1.4834 | 1.1150 | 47500 | 0.1585 | 0.6310 | 0.6310 | 0.6759 | 0.8860 | 0.477 | [[881, 116, 3], [376, 556, 68], [131, 392, 477]] | 0.881 | 0.556 | 0.477 |
1.497 | 1.1267 | 48000 | 0.1695 | 0.6343 | 0.6343 | 0.6741 | 0.8919 | 0.5 | [[866, 131, 3], [371, 553, 76], [141, 359, 500]] | 0.866 | 0.553 | 0.5 |
1.519 | 1.1384 | 48500 | 0.1653 | 0.6080 | 0.6080 | 0.6590 | 0.8898 | 0.424 | [[945, 53, 2], [504, 424, 72], [165, 337, 498]] | 0.945 | 0.424 | 0.498 |
1.5075 | 1.1502 | 49000 | 0.1515 | 0.6363 | 0.6363 | 0.6696 | 0.8782 | 0.5 | [[899, 98, 3], [406, 500, 94], [140, 327, 533]] | 0.899 | 0.5 | 0.533 |
1.4869 | 1.1619 | 49500 | 0.1596 | 0.6309 | 0.6309 | 0.6709 | 0.8895 | 0.496 | [[909, 89, 2], [423, 496, 81], [149, 337, 514]] | 0.909 | 0.496 | 0.514 |
1.4592 | 1.1737 | 50000 | 0.1609 | 0.6069 | 0.6069 | 0.6570 | 0.8823 | 0.452 | [[937, 61, 2], [475, 452, 73], [151, 376, 473]] | 0.937 | 0.452 | 0.473 |
1.4732 | 1.1854 | 50500 | 0.1594 | 0.6189 | 0.6189 | 0.6681 | 0.8847 | 0.48 | [[919, 78, 3], [444, 490, 66], [151, 369, 480]] | 0.919 | 0.49 | 0.48 |
1.4668 | 1.1971 | 51000 | 0.1689 | 0.6065 | 0.6065 | 0.6663 | 0.8972 | 0.435 | [[917, 81, 2], [441, 505, 54], [154, 411, 435]] | 0.917 | 0.505 | 0.435 |
1.4859 | 1.2089 | 51500 | 0.1585 | 0.6202 | 0.6202 | 0.6676 | 0.8851 | 0.469 | [[909, 88, 3], [417, 513, 70], [143, 388, 469]] | 0.909 | 0.513 | 0.469 |
1.4587 | 1.2206 | 52000 | 0.1642 | 0.6207 | 0.6207 | 0.6681 | 0.8873 | 0.461 | [[935, 62, 3], [467, 461, 72], [161, 337, 502]] | 0.935 | 0.461 | 0.502 |
1.5027 | 1.2323 | 52500 | 0.1674 | 0.6340 | 0.6340 | 0.6805 | 0.8835 | 0.478 | [[895, 102, 3], [383, 553, 64], [139, 383, 478]] | 0.895 | 0.553 | 0.478 |
1.4756 | 1.2441 | 53000 | 0.1572 | 0.6222 | 0.6222 | 0.6681 | 0.8798 | 0.487 | [[925, 72, 3], [440, 487, 73], [148, 364, 488]] | 0.925 | 0.487 | 0.488 |
1.4889 | 1.2558 | 53500 | 0.1655 | 0.6280 | 0.6280 | 0.6734 | 0.8864 | 0.474 | [[912, 85, 3], [400, 528, 72], [143, 383, 474]] | 0.912 | 0.528 | 0.474 |
1.4943 | 1.2675 | 54000 | 0.1605 | 0.6107 | 0.6107 | 0.6635 | 0.8859 | 0.467 | [[936, 62, 2], [465, 469, 66], [155, 378, 467]] | 0.936 | 0.469 | 0.467 |
1.4667 | 1.2793 | 54500 | 0.1772 | 0.6098 | 0.6098 | 0.6651 | 0.8961 | 0.418 | [[957, 41, 2], [517, 418, 65], [175, 325, 500]] | 0.957 | 0.418 | 0.5 |
1.4521 | 1.2910 | 55000 | 0.1654 | 0.6076 | 0.6076 | 0.6662 | 0.8866 | 0.442 | [[934, 64, 2], [456, 488, 56], [148, 410, 442]] | 0.934 | 0.488 | 0.442 |
1.4874 | 1.3028 | 55500 | 0.1607 | 0.6148 | 0.6148 | 0.6639 | 0.8857 | 0.431 | [[948, 49, 3], [497, 431, 72], [159, 334, 507]] | 0.948 | 0.431 | 0.507 |
1.4499 | 1.3145 | 56000 | 0.1682 | 0.6195 | 0.6195 | 0.6688 | 0.8896 | 0.464 | [[940, 57, 3], [469, 464, 67], [151, 357, 492]] | 0.94 | 0.464 | 0.492 |
Framework versions
- Transformers 4.51.3
- Pytorch 2.6.0+cu126
- Datasets 3.5.1
- Tokenizers 0.21.1
- Downloads last month
- 28
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for Arisp123/modernbert-llm-router
Base model
answerdotai/ModernBERT-large