Model | FiQA Task 1 | Financial Phrase Bank (FPB) | SubjECTive-QA | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
MSE | MAE | r² Score | Accuracy | Precision | Recall | F1 | Precision | Recall | F1 | Accuracy | |
Llama 3 70B Instruct | 0.123 | 0.290 | 0.272 | 0.901 | 0.904 | 0.901 | 0.902 | 0.652 | 0.573 | 0.535 | 0.573 |
Llama 3 8B Instruct | 0.161 | 0.344 | 0.045 | 0.738 | 0.801 | 0.738 | 0.698 | 0.635 | 0.625 | 0.600 | 0.625 |
DBRX Instruct | 0.160 | 0.321 | 0.052 | 0.524 | 0.727 | 0.524 | 0.499 | 0.654 | 0.541 | 0.436 | 0.541 |
DeepSeek LLM (67B) | 0.118 | 0.278 | 0.302 | 0.815 | 0.867 | 0.815 | 0.811 | 0.676 | 0.544 | 0.462 | 0.544 |
Gemma 2 27B | 0.100 | 0.266 | 0.406 | 0.890 | 0.896 | 0.890 | 0.884 | 0.562 | 0.524 | 0.515 | 0.524 |
Gemma 2 9B | 0.189 | 0.352 | -0.120 | 0.940 | 0.941 | 0.940 | 0.940 | 0.570 | 0.499 | 0.491 | 0.499 |
Mistral (7B) Instruct v0.3 | 0.135 | 0.278 | 0.200 | 0.847 | 0.854 | 0.847 | 0.841 | 0.607 | 0.542 | 0.522 | 0.542 |
Mixtral-8x22B Instruct | 0.221 | 0.364 | -0.310 | 0.768 | 0.845 | 0.768 | 0.776 | 0.614 | 0.538 | 0.510 | 0.538 |
Mixtral-8x7B Instruct | 0.208 | 0.307 | -0.229 | 0.896 | 0.898 | 0.896 | 0.893 | 0.611 | 0.518 | 0.498 | 0.518 |
Qwen 2 Instruct (72B) | 0.205 | 0.409 | -0.212 | 0.904 | 0.908 | 0.904 | 0.901 | 0.644 | 0.601 | 0.576 | 0.601 |
WizardLM-2 8x22B | 0.129 | 0.283 | 0.239 | 0.765 | 0.853 | 0.765 | 0.779 | 0.611 | 0.570 | 0.566 | 0.570 |
DeepSeek-V3 | 0.150 | 0.311 | 0.111 | 0.828 | 0.851 | 0.828 | 0.814 | 0.640 | 0.572 | 0.583 | 0.572 |
DeepSeek R1 | 0.110 | 0.289 | 0.348 | 0.904 | 0.907 | 0.904 | 0.902 | 0.644 | 0.489 | 0.499 | 0.489 |
QwQ-32B-Preview | 0.141 | 0.290 | 0.165 | 0.812 | 0.827 | 0.812 | 0.815 | 0.629 | 0.534 | 0.550 | 0.534 |
Jamba 1.5 Mini | 0.119 | 0.282 | 0.293 | 0.784 | 0.814 | 0.784 | 0.765 | 0.380 | 0.525 | 0.418 | 0.525 |
Jamba 1.5 Large | 0.183 | 0.363 | -0.085 | 0.824 | 0.850 | 0.824 | 0.798 | 0.635 | 0.573 | 0.582 | 0.573 |
Claude 3.5 Sonnet | 0.101 | 0.268 | 0.402 | 0.944 | 0.945 | 0.944 | 0.944 | 0.634 | 0.585 | 0.553 | 0.585 |
Claude 3 Haiku | 0.167 | 0.349 | 0.008 | 0.907 | 0.913 | 0.907 | 0.908 | 0.619 | 0.538 | 0.463 | 0.538 |
Cohere Command R 7B | 0.164 | 0.319 | 0.028 | 0.835 | 0.861 | 0.835 | 0.840 | 0.609 | 0.547 | 0.532 | 0.547 |
Cohere Command R + | 0.106 | 0.274 | 0.373 | 0.741 | 0.806 | 0.741 | 0.699 | 0.608 | 0.547 | 0.533 | 0.547 |
Google Gemini 1.5 Pro | 0.144 | 0.329 | 0.149 | 0.890 | 0.895 | 0.890 | 0.885 | 0.642 | 0.587 | 0.593 | 0.587 |
OpenAI gpt-4o | 0.184 | 0.317 | -0.089 | 0.929 | 0.931 | 0.929 | 0.928 | 0.639 | 0.515 | 0.541 | 0.515 |
OpenAI o1-mini | 0.120 | 0.295 | 0.289 | 0.918 | 0.917 | 0.918 | 0.917 | 0.660 | 0.515 | 0.542 | 0.515 |
Note: Color highlighting indicates performance ranking: Best , Strong , Good