modernbert-llm-router

This model is a fine-tuned version of answerdotai/ModernBERT-large on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1570
  • F1: 0.6482
  • Macro F1: 0.6482
  • Precision: 0.6799
  • Cross Entropy: 0.8785
  • Min Class Accuracy: 0.528
  • Confusion Matrix: [[888, 109, 3], [383, 528, 89], [134, 320, 546]]
  • Accuracy Class 0: 0.888
  • Accuracy Class 1: 0.528
  • Accuracy Class 2: 0.546

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-07
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 64
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss F1 Macro F1 Precision Cross Entropy Min Class Accuracy Confusion Matrix Accuracy Class 0 Accuracy Class 1 Accuracy Class 2
3.6672 0.0469 2000 0.2749 0.1994 0.1994 0.6059 1.1097 0.003 [[973, 27, 0], [949, 51, 0], [969, 28, 3]] 0.973 0.051 0.003
3.6053 0.0587 2500 0.2721 0.1932 0.1932 0.4968 1.1049 0.004 [[981, 19, 0], [960, 38, 2], [975, 21, 4]] 0.981 0.038 0.004
3.5157 0.0704 3000 0.2677 0.2129 0.2129 0.5271 1.0985 0.016 [[976, 24, 0], [937, 57, 6], [954, 30, 16]] 0.976 0.057 0.016
3.4239 0.0822 3500 0.2611 0.2567 0.2567 0.5329 1.0879 0.061 [[968, 27, 5], [914, 61, 25], [888, 30, 82]] 0.968 0.061 0.082
3.3098 0.0939 4000 0.2733 0.2384 0.2384 0.5103 1.0886 0.05 [[969, 28, 3], [920, 63, 17], [907, 43, 50]] 0.969 0.063 0.05
3.2217 0.1056 4500 0.2723 0.2861 0.2861 0.5349 1.0766 0.064 [[971, 23, 6], [896, 64, 40], [832, 39, 129]] 0.971 0.064 0.129
2.958 0.1174 5000 0.2598 0.3678 0.3678 0.5286 1.0522 0.103 [[956, 31, 13], [829, 103, 68], [644, 103, 253]] 0.956 0.103 0.253
2.8687 0.1291 5500 0.2531 0.4079 0.4079 0.5194 1.0371 0.165 [[936, 46, 18], [748, 165, 87], [524, 195, 281]] 0.936 0.165 0.281
2.783 0.1408 6000 0.2292 0.4652 0.4652 0.5485 1.0109 0.217 [[930, 53, 17], [672, 217, 111], [419, 217, 364]] 0.93 0.217 0.364
2.6948 0.1526 6500 0.2483 0.4562 0.4562 0.5496 1.0164 0.251 [[935, 51, 14], [666, 251, 83], [413, 285, 302]] 0.935 0.251 0.302
2.6202 0.1643 7000 0.2178 0.5113 0.5113 0.5703 0.9875 0.323 [[914, 70, 16], [572, 323, 105], [279, 346, 375]] 0.914 0.323 0.375
2.4887 0.1760 7500 0.2243 0.5045 0.5045 0.5680 0.9850 0.298 [[924, 62, 14], [598, 298, 104], [297, 326, 377]] 0.924 0.298 0.377
2.4182 0.1878 8000 0.2019 0.5590 0.5590 0.5955 0.9603 0.35 [[902, 76, 22], [506, 350, 144], [235, 283, 482]] 0.902 0.35 0.482
2.3916 0.1995 8500 0.1987 0.5744 0.5744 0.6087 0.9520 0.385 [[898, 79, 23], [474, 385, 141], [230, 281, 489]] 0.898 0.385 0.489
2.309 0.2113 9000 0.2089 0.5648 0.5648 0.6127 0.9659 0.417 [[895, 95, 10], [463, 431, 106], [238, 345, 417]] 0.895 0.431 0.417
2.3453 0.2230 9500 0.2057 0.5755 0.5755 0.6232 0.9630 0.424 [[898, 93, 9], [451, 450, 99], [218, 358, 424]] 0.898 0.45 0.424
2.2393 0.2347 10000 0.2036 0.5701 0.5701 0.6159 0.9600 0.413 [[902, 88, 10], [479, 413, 108], [228, 329, 443]] 0.902 0.413 0.443
2.2557 0.2465 10500 0.1972 0.5764 0.5764 0.6219 0.9539 0.422 [[886, 103, 11], [438, 463, 99], [205, 373, 422]] 0.886 0.463 0.422
2.1826 0.2582 11000 0.2084 0.5463 0.5463 0.6094 0.9636 0.386 [[924, 67, 9], [521, 394, 85], [255, 359, 386]] 0.924 0.394 0.386
2.1356 0.2699 11500 0.2013 0.5804 0.5804 0.6278 0.9521 0.435 [[910, 80, 10], [459, 443, 98], [216, 349, 435]] 0.91 0.443 0.435
2.1075 0.2817 12000 0.1884 0.5852 0.5852 0.6247 0.9432 0.439 [[875, 116, 9], [414, 477, 109], [187, 374, 439]] 0.875 0.477 0.439
2.0963 0.2934 12500 0.1991 0.5773 0.5773 0.6309 0.9483 0.414 [[900, 91, 9], [456, 463, 81], [202, 384, 414]] 0.9 0.463 0.414
2.0256 0.3051 13000 0.2012 0.5747 0.5747 0.6262 0.9461 0.425 [[900, 91, 9], [468, 444, 88], [214, 361, 425]] 0.9 0.444 0.425
2.0341 0.3169 13500 0.2036 0.5835 0.5835 0.6330 0.9475 0.438 [[896, 95, 9], [456, 457, 87], [208, 354, 438]] 0.896 0.457 0.438
2.0004 0.3286 14000 0.1988 0.5831 0.5831 0.6372 0.9422 0.431 [[905, 88, 7], [466, 456, 78], [205, 364, 431]] 0.905 0.456 0.431
1.9904 0.3404 14500 0.1981 0.5813 0.5813 0.6331 0.9374 0.426 [[905, 88, 7], [459, 457, 84], [197, 377, 426]] 0.905 0.457 0.426
1.9248 0.3521 15000 0.1974 0.5886 0.5886 0.6300 0.9385 0.447 [[870, 120, 10], [419, 481, 100], [192, 361, 447]] 0.87 0.481 0.447
1.9778 0.3638 15500 0.1974 0.5922 0.5922 0.6400 0.9378 0.44 [[898, 93, 9], [439, 477, 84], [184, 376, 440]] 0.898 0.477 0.44
1.8983 0.3756 16000 0.1901 0.6029 0.6029 0.6479 0.9304 0.443 [[877, 115, 8], [397, 519, 84], [168, 389, 443]] 0.877 0.519 0.443
1.8844 0.3873 16500 0.1936 0.5919 0.5919 0.6362 0.9258 0.444 [[906, 86, 8], [441, 466, 93], [170, 386, 444]] 0.906 0.466 0.444
1.9149 0.3990 17000 0.1840 0.6005 0.6005 0.6352 0.9269 0.431 [[899, 91, 10], [451, 431, 118], [190, 302, 508]] 0.899 0.431 0.508
1.8763 0.4108 17500 0.1856 0.5904 0.5904 0.6319 0.9256 0.436 [[901, 91, 8], [466, 436, 98], [171, 357, 472]] 0.901 0.436 0.472
1.8572 0.4225 18000 0.1936 0.5691 0.5691 0.6309 0.9335 0.412 [[923, 72, 5], [504, 425, 71], [205, 383, 412]] 0.923 0.425 0.412
1.8527 0.4342 18500 0.1868 0.5985 0.5985 0.6478 0.9266 0.436 [[895, 98, 7], [423, 500, 77], [163, 401, 436]] 0.895 0.5 0.436
1.8157 0.4460 19000 0.2001 0.5833 0.5833 0.6401 0.9345 0.438 [[914, 81, 5], [492, 438, 70], [189, 370, 441]] 0.914 0.438 0.441
1.8342 0.4577 19500 0.1848 0.5952 0.5952 0.6372 0.9178 0.416 [[914, 79, 7], [486, 416, 98], [180, 324, 496]] 0.914 0.416 0.496
1.8161 0.4695 20000 0.1834 0.5976 0.5976 0.6429 0.9200 0.428 [[913, 81, 6], [484, 428, 88], [182, 328, 490]] 0.913 0.428 0.49
1.8287 0.4812 20500 0.1896 0.6024 0.6024 0.6546 0.9252 0.458 [[900, 97, 3], [449, 482, 69], [172, 370, 458]] 0.9 0.482 0.458
1.7956 0.4929 21000 0.1739 0.6133 0.6133 0.6484 0.9062 0.455 [[900, 93, 7], [441, 455, 104], [164, 320, 516]] 0.9 0.455 0.516
1.7941 0.5047 21500 0.1871 0.5908 0.5908 0.6490 0.9208 0.418 [[898, 98, 4], [441, 495, 64], [163, 419, 418]] 0.898 0.495 0.418
1.7451 0.5164 22000 0.1841 0.6200 0.6200 0.6564 0.9181 0.497 [[879, 115, 6], [400, 507, 93], [152, 351, 497]] 0.879 0.507 0.497
1.7314 0.5281 22500 0.1921 0.5787 0.5787 0.6449 0.9252 0.401 [[911, 86, 3], [474, 470, 56], [170, 429, 401]] 0.911 0.47 0.401
1.7192 0.5399 23000 0.1820 0.6059 0.6059 0.6456 0.9105 0.453 [[899, 94, 7], [456, 453, 91], [155, 348, 497]] 0.899 0.453 0.497
1.7412 0.5516 23500 0.1803 0.6137 0.6137 0.6492 0.9130 0.455 [[902, 91, 7], [445, 455, 100], [153, 332, 515]] 0.902 0.455 0.515
1.7121 0.5633 24000 0.1743 0.6074 0.6074 0.6439 0.9082 0.43 [[908, 85, 7], [463, 430, 107], [174, 306, 520]] 0.908 0.43 0.52
1.7407 0.5751 24500 0.1718 0.6060 0.6060 0.6464 0.9057 0.447 [[908, 86, 6], [462, 447, 91], [153, 350, 497]] 0.908 0.447 0.497
1.689 0.5868 25000 0.1870 0.5823 0.5823 0.6467 0.9149 0.424 [[929, 68, 3], [500, 442, 58], [183, 393, 424]] 0.929 0.442 0.424
1.6961 0.5986 25500 0.1737 0.6068 0.6068 0.6472 0.9072 0.435 [[916, 80, 4], [469, 435, 96], [166, 328, 506]] 0.916 0.435 0.506
1.689 0.6103 26000 0.1758 0.6225 0.6225 0.6596 0.9050 0.497 [[884, 111, 5], [415, 497, 88], [145, 346, 509]] 0.884 0.497 0.509
1.694 0.6220 26500 0.1747 0.6089 0.6089 0.6476 0.9020 0.445 [[901, 94, 5], [460, 445, 95], [163, 325, 512]] 0.901 0.445 0.512
1.6739 0.6338 27000 0.1800 0.5987 0.5987 0.6543 0.9131 0.438 [[903, 94, 3], [444, 491, 65], [165, 397, 438]] 0.903 0.491 0.438
1.7086 0.6455 27500 0.1764 0.6079 0.6079 0.6563 0.9035 0.468 [[911, 83, 6], [461, 468, 71], [163, 359, 478]] 0.911 0.468 0.478
1.644 0.6572 28000 0.1806 0.6132 0.6132 0.6596 0.9075 0.469 [[911, 85, 4], [457, 469, 74], [162, 347, 491]] 0.911 0.469 0.491
1.7015 0.6690 28500 0.1729 0.6162 0.6162 0.6587 0.9007 0.445 [[921, 75, 4], [469, 445, 86], [169, 314, 517]] 0.921 0.445 0.517
1.6286 0.6807 29000 0.1721 0.6223 0.6223 0.6665 0.9002 0.49 [[887, 109, 4], [415, 513, 72], [153, 357, 490]] 0.887 0.513 0.49
1.6681 0.6924 29500 0.1896 0.5888 0.5888 0.6560 0.9180 0.415 [[928, 69, 3], [480, 470, 50], [174, 411, 415]] 0.928 0.47 0.415
1.6712 0.7042 30000 0.1756 0.6084 0.6084 0.6606 0.9035 0.452 [[923, 73, 4], [482, 452, 66], [177, 337, 486]] 0.923 0.452 0.486
1.6284 0.7159 30500 0.1732 0.6029 0.6029 0.6616 0.9055 0.46 [[926, 71, 3], [484, 461, 55], [168, 372, 460]] 0.926 0.461 0.46
1.6946 0.7277 31000 0.1732 0.6146 0.6146 0.6668 0.9055 0.465 [[894, 103, 3], [428, 512, 60], [155, 380, 465]] 0.894 0.512 0.465
1.6518 0.7394 31500 0.1817 0.6034 0.6034 0.6599 0.9063 0.45 [[931, 65, 4], [490, 450, 60], [174, 357, 469]] 0.931 0.45 0.469
1.6356 0.7511 32000 0.1748 0.6009 0.6009 0.6582 0.8966 0.449 [[930, 67, 3], [491, 449, 60], [172, 364, 464]] 0.93 0.449 0.464
1.6486 0.7629 32500 0.1773 0.6035 0.6035 0.6604 0.9056 0.444 [[910, 87, 3], [449, 492, 59], [160, 396, 444]] 0.91 0.492 0.444
1.5899 0.7746 33000 0.1729 0.6081 0.6081 0.6557 0.8997 0.439 [[924, 72, 4], [486, 439, 75], [164, 338, 498]] 0.924 0.439 0.498
1.6076 0.7863 33500 0.1833 0.5979 0.5979 0.6564 0.9073 0.434 [[939, 59, 2], [503, 434, 63], [184, 351, 465]] 0.939 0.434 0.465
1.5796 0.7981 34000 0.1682 0.6246 0.6246 0.6703 0.8937 0.484 [[895, 101, 4], [411, 520, 69], [154, 362, 484]] 0.895 0.52 0.484
1.5936 0.8098 34500 0.1580 0.6212 0.6212 0.6628 0.8843 0.458 [[918, 78, 4], [460, 458, 82], [157, 324, 519]] 0.918 0.458 0.519
1.6008 0.8215 35000 0.1764 0.5922 0.5922 0.6540 0.9006 0.406 [[953, 44, 3], [534, 406, 60], [189, 343, 468]] 0.953 0.406 0.468
1.5861 0.8333 35500 0.1611 0.6022 0.6022 0.6586 0.8958 0.438 [[903, 94, 3], [440, 500, 60], [148, 414, 438]] 0.903 0.5 0.438
1.586 0.8450 36000 0.1632 0.6213 0.6213 0.6582 0.8864 0.435 [[917, 79, 4], [469, 435, 96], [156, 299, 545]] 0.917 0.435 0.545
1.5631 0.8568 36500 0.1719 0.6040 0.6040 0.6629 0.8979 0.449 [[918, 79, 3], [463, 482, 55], [168, 383, 449]] 0.918 0.482 0.449
1.584 0.8685 37000 0.1675 0.6097 0.6097 0.6628 0.8934 0.453 [[929, 68, 3], [484, 453, 63], [166, 350, 484]] 0.929 0.453 0.484
1.5881 0.8802 37500 0.1845 0.5910 0.5910 0.6545 0.9042 0.412 [[949, 49, 2], [532, 412, 56], [185, 354, 461]] 0.949 0.412 0.461
1.5851 0.8920 38000 0.1627 0.6334 0.6334 0.6753 0.8916 0.508 [[903, 94, 3], [419, 508, 73], [153, 334, 513]] 0.903 0.508 0.513
1.547 0.9037 38500 0.1644 0.6178 0.6178 0.6572 0.8845 0.427 [[923, 73, 4], [481, 427, 92], [158, 303, 539]] 0.923 0.427 0.539
1.5428 0.9154 39000 0.1652 0.6182 0.6182 0.6636 0.8880 0.451 [[927, 70, 3], [473, 451, 76], [159, 330, 511]] 0.927 0.451 0.511
1.5543 0.9272 39500 0.1594 0.6354 0.6354 0.6691 0.8830 0.491 [[903, 92, 5], [415, 491, 94], [149, 314, 537]] 0.903 0.491 0.537
1.5471 0.9389 40000 0.1655 0.6210 0.6210 0.6607 0.8808 0.442 [[927, 68, 5], [470, 442, 88], [152, 319, 529]] 0.927 0.442 0.529
1.5573 0.9506 40500 0.1665 0.6084 0.6084 0.6592 0.8944 0.442 [[934, 63, 3], [490, 442, 68], [157, 355, 488]] 0.934 0.442 0.488
1.5905 0.9624 41000 0.1610 0.6131 0.6131 0.6656 0.8919 0.45 [[914, 83, 3], [426, 510, 64], [145, 405, 450]] 0.914 0.51 0.45
1.5487 0.9741 41500 0.1633 0.6251 0.6251 0.6699 0.8882 0.476 [[918, 79, 3], [452, 476, 72], [157, 332, 511]] 0.918 0.476 0.511
1.5515 0.9859 42000 0.1629 0.6350 0.6350 0.6733 0.8829 0.51 [[891, 106, 3], [395, 525, 80], [139, 351, 510]] 0.891 0.525 0.51
1.5963 0.9976 42500 0.1638 0.6246 0.6246 0.6727 0.8935 0.47 [[894, 103, 3], [398, 536, 66], [153, 377, 470]] 0.894 0.536 0.47
1.4987 1.0093 43000 0.1754 0.5979 0.5979 0.6558 0.8947 0.424 [[942, 56, 2], [516, 424, 60], [166, 362, 472]] 0.942 0.424 0.472
1.5302 1.0211 43500 0.1590 0.6342 0.6342 0.6730 0.8836 0.506 [[898, 99, 3], [415, 506, 79], [146, 333, 521]] 0.898 0.506 0.521
1.4878 1.0328 44000 0.1686 0.6260 0.6260 0.6702 0.8946 0.495 [[902, 95, 3], [434, 495, 71], [158, 336, 506]] 0.902 0.495 0.506
1.4577 1.0446 44500 0.1789 0.6039 0.6039 0.6642 0.9034 0.442 [[918, 80, 2], [458, 489, 53], [159, 399, 442]] 0.918 0.489 0.442
1.4834 1.0563 45000 0.1573 0.6372 0.6372 0.6730 0.8758 0.499 [[902, 95, 3], [414, 499, 87], [148, 318, 534]] 0.902 0.499 0.534
1.493 1.0680 45500 0.1495 0.6302 0.6302 0.6664 0.8767 0.455 [[920, 77, 3], [453, 455, 92], [146, 308, 546]] 0.92 0.455 0.546
1.4802 1.0798 46000 0.1570 0.6482 0.6482 0.6799 0.8785 0.528 [[888, 109, 3], [383, 528, 89], [134, 320, 546]] 0.888 0.528 0.546
1.4442 1.0915 46500 0.1624 0.6376 0.6376 0.6743 0.8885 0.514 [[893, 105, 2], [404, 514, 82], [138, 336, 526]] 0.893 0.514 0.526
1.4739 1.1032 47000 0.1647 0.6225 0.6225 0.6680 0.8885 0.49 [[914, 83, 3], [439, 490, 71], [146, 361, 493]] 0.914 0.49 0.493
1.4834 1.1150 47500 0.1585 0.6310 0.6310 0.6759 0.8860 0.477 [[881, 116, 3], [376, 556, 68], [131, 392, 477]] 0.881 0.556 0.477
1.497 1.1267 48000 0.1695 0.6343 0.6343 0.6741 0.8919 0.5 [[866, 131, 3], [371, 553, 76], [141, 359, 500]] 0.866 0.553 0.5
1.519 1.1384 48500 0.1653 0.6080 0.6080 0.6590 0.8898 0.424 [[945, 53, 2], [504, 424, 72], [165, 337, 498]] 0.945 0.424 0.498
1.5075 1.1502 49000 0.1515 0.6363 0.6363 0.6696 0.8782 0.5 [[899, 98, 3], [406, 500, 94], [140, 327, 533]] 0.899 0.5 0.533
1.4869 1.1619 49500 0.1596 0.6309 0.6309 0.6709 0.8895 0.496 [[909, 89, 2], [423, 496, 81], [149, 337, 514]] 0.909 0.496 0.514
1.4592 1.1737 50000 0.1609 0.6069 0.6069 0.6570 0.8823 0.452 [[937, 61, 2], [475, 452, 73], [151, 376, 473]] 0.937 0.452 0.473
1.4732 1.1854 50500 0.1594 0.6189 0.6189 0.6681 0.8847 0.48 [[919, 78, 3], [444, 490, 66], [151, 369, 480]] 0.919 0.49 0.48
1.4668 1.1971 51000 0.1689 0.6065 0.6065 0.6663 0.8972 0.435 [[917, 81, 2], [441, 505, 54], [154, 411, 435]] 0.917 0.505 0.435
1.4859 1.2089 51500 0.1585 0.6202 0.6202 0.6676 0.8851 0.469 [[909, 88, 3], [417, 513, 70], [143, 388, 469]] 0.909 0.513 0.469
1.4587 1.2206 52000 0.1642 0.6207 0.6207 0.6681 0.8873 0.461 [[935, 62, 3], [467, 461, 72], [161, 337, 502]] 0.935 0.461 0.502
1.5027 1.2323 52500 0.1674 0.6340 0.6340 0.6805 0.8835 0.478 [[895, 102, 3], [383, 553, 64], [139, 383, 478]] 0.895 0.553 0.478
1.4756 1.2441 53000 0.1572 0.6222 0.6222 0.6681 0.8798 0.487 [[925, 72, 3], [440, 487, 73], [148, 364, 488]] 0.925 0.487 0.488
1.4889 1.2558 53500 0.1655 0.6280 0.6280 0.6734 0.8864 0.474 [[912, 85, 3], [400, 528, 72], [143, 383, 474]] 0.912 0.528 0.474
1.4943 1.2675 54000 0.1605 0.6107 0.6107 0.6635 0.8859 0.467 [[936, 62, 2], [465, 469, 66], [155, 378, 467]] 0.936 0.469 0.467
1.4667 1.2793 54500 0.1772 0.6098 0.6098 0.6651 0.8961 0.418 [[957, 41, 2], [517, 418, 65], [175, 325, 500]] 0.957 0.418 0.5
1.4521 1.2910 55000 0.1654 0.6076 0.6076 0.6662 0.8866 0.442 [[934, 64, 2], [456, 488, 56], [148, 410, 442]] 0.934 0.488 0.442
1.4874 1.3028 55500 0.1607 0.6148 0.6148 0.6639 0.8857 0.431 [[948, 49, 3], [497, 431, 72], [159, 334, 507]] 0.948 0.431 0.507
1.4499 1.3145 56000 0.1682 0.6195 0.6195 0.6688 0.8896 0.464 [[940, 57, 3], [469, 464, 67], [151, 357, 492]] 0.94 0.464 0.492

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.6.0+cu126
  • Datasets 3.5.1
  • Tokenizers 0.21.1
Downloads last month
28
Safetensors
Model size
396M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Arisp123/modernbert-llm-router

Finetuned
(180)
this model