Model | Datasets (Accuracy) | ||
---|---|---|---|
FinQA | ConvFinQA | TATQA | |
Llama 3 70B Instruct | 0.809 | 0.709 | 0.772 |
Llama 3 8B Instruct | 0.767 | 0.268 | 0.706 |
DBRX Instruct | 0.738 | 0.252 | 0.633 |
DeepSeek LLM (67B) | 0.742 | 0.174 | 0.355 |
Gemma 2 27B | 0.768 | 0.268 | 0.734 |
Gemma 2 9B | 0.779 | 0.292 | 0.750 |
Mistral (7B) Instruct v0.3 | 0.655 | 0.199 | 0.553 |
Mixtral-8x22B Instruct | 0.766 | 0.285 | 0.666 |
Mixtral-8x7B Instruct | 0.611 | 0.315 | 0.501 |
Qwen 2 Instruct (72B) | 0.819 | 0.269 | 0.715 |
WizardLM-2 8x22B | 0.796 | 0.247 | 0.725 |
DeepSeek-V3 | 0.840 | 0.261 | 0.779 |
DeepSeek R1 | 0.836 | 0.853 | 0.858 |
QwQ-32B-Preview | 0.793 | 0.282 | 0.796 |
Jamba 1.5 Mini | 0.666 | 0.218 | 0.586 |
Jamba 1.5 Large | 0.790 | 0.225 | 0.660 |
Claude 3.5 Sonnet | 0.844 | 0.402 | 0.700 |
Claude 3 Haiku | 0.803 | 0.421 | 0.733 |
Cohere Command R 7B | 0.709 | 0.212 | 0.716 |
Cohere Command R + | 0.776 | 0.259 | 0.698 |
Google Gemini 1.5 Pro | 0.829 | 0.280 | 0.763 |
OpenAI gpt-4o | 0.836 | 0.749 | 0.754 |
OpenAI o1-mini | 0.799 | 0.840 | 0.698 |
Note: Color highlighting indicates performance ranking: Best , Strong , Good