Text Classification Task Results

Model Banking77 FinBench FOMC NumClaim Headlines
Accuracy Precision Recall F1 Accuracy Precision Recall F1 Accuracy Precision Recall F1 Accuracy Precision Recall F1 Accuracy
Llama 3 70B Instruct 0.660 0.748 0.660 0.645 0.222 0.826 0.222 0.309 0.661 0.662 0.661 0.652 0.430 0.240 0.980 0.386 0.811
Llama 3 8B Instruct 0.534 0.672 0.534 0.512 0.543 0.857 0.543 0.659 0.565 0.618 0.565 0.497 0.801 0.463 0.571 0.511 0.763
DBRX Instruct 0.578 0.706 0.578 0.574 0.359 0.851 0.359 0.483 0.285 0.572 0.285 0.193 0.222 0.190 1.000 0.319 0.746
DeepSeek LLM (67B) 0.596 0.711 0.596 0.578 0.369 0.856 0.369 0.492 0.532 0.678 0.532 0.407 0.832 1.000 0.082 0.151 0.778
Gemma 2 27B 0.639 0.730 0.639 0.621 0.410 0.849 0.410 0.538 0.651 0.704 0.651 0.620 0.471 0.257 1.000 0.408 0.808
Gemma 2 9B 0.630 0.710 0.630 0.609 0.412 0.848 0.412 0.541 0.595 0.694 0.595 0.519 0.371 0.224 0.990 0.365 0.856
Mistral (7B) Instruct v0.3 0.547 0.677 0.547 0.528 0.375 0.839 0.375 0.503 0.587 0.598 0.587 0.542 0.521 0.266 0.918 0.412 0.779
Mixtral-8x22B Instruct 0.622 0.718 0.622 0.602 0.166 0.811 0.166 0.221 0.562 0.709 0.562 0.465 0.732 0.384 0.775 0.513 0.835
Mixtral-8x7B Instruct 0.567 0.693 0.567 0.547 0.285 0.838 0.285 0.396 0.623 0.636 0.623 0.603 0.765 0.431 0.898 0.583 0.805
Qwen 2 Instruct (72B) 0.644 0.730 0.644 0.627 0.370 0.848 0.370 0.495 0.623 0.639 0.623 0.605 0.821 0.506 0.867 0.639 0.830
WizardLM-2 8x22B 0.664 0.737 0.664 0.648 0.373 0.842 0.373 0.500 0.583 0.710 0.583 0.505 0.831 0.630 0.173 0.272 0.797
DeepSeek-V3 0.722 0.774 0.722 0.714 0.362 0.845 0.362 0.487 0.625 0.712 0.625 0.578 0.860 0.586 0.796 0.675 0.729
DeepSeek R1 0.772 0.789 0.772 0.763 0.306 0.846 0.306 0.419 0.679 0.682 0.679 0.670 0.851 0.557 0.898 0.688 0.769
QwQ-32B-Preview 0.577 0.747 0.577 0.613 0.716 0.871 0.716 0.784 0.591 0.630 0.591 0.555 0.819 1.000 0.010 0.020 0.744
Jamba 1.5 Mini 0.528 0.630 0.528 0.508 0.913 0.883 0.913 0.898 0.572 0.678 0.572 0.499 0.812 0.429 0.092 0.151 0.682
Jamba 1.5 Large 0.642 0.746 0.642 0.628 0.494 0.851 0.494 0.618 0.597 0.650 0.597 0.550 0.855 0.639 0.469 0.541 0.782
Claude 3.5 Sonnet 0.682 0.755 0.682 0.668 0.513 0.854 0.513 0.634 0.675 0.677 0.675 0.674 0.879 0.646 0.745 0.692 0.827
Claude 3 Haiku 0.639 0.735 0.639 0.622 0.067 0.674 0.067 0.022 0.633 0.634 0.633 0.631 0.838 0.556 0.561 0.558 0.781
Cohere Command R 7B 0.530 0.650 0.530 0.516 0.682 0.868 0.682 0.762 0.536 0.505 0.536 0.459 0.797 0.210 0.041 0.068 0.770
Cohere Command R + 0.660 0.747 0.660 0.651 0.575 0.859 0.575 0.684 0.526 0.655 0.526 0.393 0.804 0.333 0.071 0.118 0.812
Google Gemini 1.5 Pro 0.483 0.487 0.483 0.418 0.240 0.823 0.240 0.336 0.619 0.667 0.619 0.579 0.700 0.369 0.908 0.525 0.837
OpenAI gpt-4o 0.704 0.792 0.704 0.710 0.396 0.846 0.396 0.524 0.681 0.719 0.681 0.664 0.896 0.667 0.857 0.750 0.824
OpenAI o1-mini 0.681 0.760 0.681 0.670 0.487 0.851 0.487 0.612 0.651 0.670 0.651 0.635 0.888 0.664 0.786 0.720 0.769

Note: Color highlighting indicates performance ranking:  Best ,  Strong ,  Good