Model | Banking77 | FinBench | FOMC | NumClaim | Headlines | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Accuracy | Precision | Recall | F1 | Accuracy | Precision | Recall | F1 | Accuracy | Precision | Recall | F1 | Accuracy | Precision | Recall | F1 | Accuracy | |
Llama 3 70B Instruct | 0.660 | 0.748 | 0.660 | 0.645 | 0.222 | 0.826 | 0.222 | 0.309 | 0.661 | 0.662 | 0.661 | 0.652 | 0.430 | 0.240 | 0.980 | 0.386 | 0.811 |
Llama 3 8B Instruct | 0.534 | 0.672 | 0.534 | 0.512 | 0.543 | 0.857 | 0.543 | 0.659 | 0.565 | 0.618 | 0.565 | 0.497 | 0.801 | 0.463 | 0.571 | 0.511 | 0.763 |
DBRX Instruct | 0.578 | 0.706 | 0.578 | 0.574 | 0.359 | 0.851 | 0.359 | 0.483 | 0.285 | 0.572 | 0.285 | 0.193 | 0.222 | 0.190 | 1.000 | 0.319 | 0.746 |
DeepSeek LLM (67B) | 0.596 | 0.711 | 0.596 | 0.578 | 0.369 | 0.856 | 0.369 | 0.492 | 0.532 | 0.678 | 0.532 | 0.407 | 0.832 | 1.000 | 0.082 | 0.151 | 0.778 |
Gemma 2 27B | 0.639 | 0.730 | 0.639 | 0.621 | 0.410 | 0.849 | 0.410 | 0.538 | 0.651 | 0.704 | 0.651 | 0.620 | 0.471 | 0.257 | 1.000 | 0.408 | 0.808 |
Gemma 2 9B | 0.630 | 0.710 | 0.630 | 0.609 | 0.412 | 0.848 | 0.412 | 0.541 | 0.595 | 0.694 | 0.595 | 0.519 | 0.371 | 0.224 | 0.990 | 0.365 | 0.856 |
Mistral (7B) Instruct v0.3 | 0.547 | 0.677 | 0.547 | 0.528 | 0.375 | 0.839 | 0.375 | 0.503 | 0.587 | 0.598 | 0.587 | 0.542 | 0.521 | 0.266 | 0.918 | 0.412 | 0.779 |
Mixtral-8x22B Instruct | 0.622 | 0.718 | 0.622 | 0.602 | 0.166 | 0.811 | 0.166 | 0.221 | 0.562 | 0.709 | 0.562 | 0.465 | 0.732 | 0.384 | 0.775 | 0.513 | 0.835 |
Mixtral-8x7B Instruct | 0.567 | 0.693 | 0.567 | 0.547 | 0.285 | 0.838 | 0.285 | 0.396 | 0.623 | 0.636 | 0.623 | 0.603 | 0.765 | 0.431 | 0.898 | 0.583 | 0.805 |
Qwen 2 Instruct (72B) | 0.644 | 0.730 | 0.644 | 0.627 | 0.370 | 0.848 | 0.370 | 0.495 | 0.623 | 0.639 | 0.623 | 0.605 | 0.821 | 0.506 | 0.867 | 0.639 | 0.830 |
WizardLM-2 8x22B | 0.664 | 0.737 | 0.664 | 0.648 | 0.373 | 0.842 | 0.373 | 0.500 | 0.583 | 0.710 | 0.583 | 0.505 | 0.831 | 0.630 | 0.173 | 0.272 | 0.797 |
DeepSeek-V3 | 0.722 | 0.774 | 0.722 | 0.714 | 0.362 | 0.845 | 0.362 | 0.487 | 0.625 | 0.712 | 0.625 | 0.578 | 0.860 | 0.586 | 0.796 | 0.675 | 0.729 |
DeepSeek R1 | 0.772 | 0.789 | 0.772 | 0.763 | 0.306 | 0.846 | 0.306 | 0.419 | 0.679 | 0.682 | 0.679 | 0.670 | 0.851 | 0.557 | 0.898 | 0.688 | 0.769 |
QwQ-32B-Preview | 0.577 | 0.747 | 0.577 | 0.613 | 0.716 | 0.871 | 0.716 | 0.784 | 0.591 | 0.630 | 0.591 | 0.555 | 0.819 | 1.000 | 0.010 | 0.020 | 0.744 |
Jamba 1.5 Mini | 0.528 | 0.630 | 0.528 | 0.508 | 0.913 | 0.883 | 0.913 | 0.898 | 0.572 | 0.678 | 0.572 | 0.499 | 0.812 | 0.429 | 0.092 | 0.151 | 0.682 |
Jamba 1.5 Large | 0.642 | 0.746 | 0.642 | 0.628 | 0.494 | 0.851 | 0.494 | 0.618 | 0.597 | 0.650 | 0.597 | 0.550 | 0.855 | 0.639 | 0.469 | 0.541 | 0.782 |
Claude 3.5 Sonnet | 0.682 | 0.755 | 0.682 | 0.668 | 0.513 | 0.854 | 0.513 | 0.634 | 0.675 | 0.677 | 0.675 | 0.674 | 0.879 | 0.646 | 0.745 | 0.692 | 0.827 |
Claude 3 Haiku | 0.639 | 0.735 | 0.639 | 0.622 | 0.067 | 0.674 | 0.067 | 0.022 | 0.633 | 0.634 | 0.633 | 0.631 | 0.838 | 0.556 | 0.561 | 0.558 | 0.781 |
Cohere Command R 7B | 0.530 | 0.650 | 0.530 | 0.516 | 0.682 | 0.868 | 0.682 | 0.762 | 0.536 | 0.505 | 0.536 | 0.459 | 0.797 | 0.210 | 0.041 | 0.068 | 0.770 |
Cohere Command R + | 0.660 | 0.747 | 0.660 | 0.651 | 0.575 | 0.859 | 0.575 | 0.684 | 0.526 | 0.655 | 0.526 | 0.393 | 0.804 | 0.333 | 0.071 | 0.118 | 0.812 |
Google Gemini 1.5 Pro | 0.483 | 0.487 | 0.483 | 0.418 | 0.240 | 0.823 | 0.240 | 0.336 | 0.619 | 0.667 | 0.619 | 0.579 | 0.700 | 0.369 | 0.908 | 0.525 | 0.837 |
OpenAI gpt-4o | 0.704 | 0.792 | 0.704 | 0.710 | 0.396 | 0.846 | 0.396 | 0.524 | 0.681 | 0.719 | 0.681 | 0.664 | 0.896 | 0.667 | 0.857 | 0.750 | 0.824 |
OpenAI o1-mini | 0.681 | 0.760 | 0.681 | 0.670 | 0.487 | 0.851 | 0.487 | 0.612 | 0.651 | 0.670 | 0.651 | 0.635 | 0.888 | 0.664 | 0.786 | 0.720 | 0.769 |
Note: Color highlighting indicates performance ranking: Best , Strong , Good