Sentiment Analysis Task Results

Model FiQA Task 1 Financial Phrase Bank (FPB) SubjECTive-QA
MSE MAE r² Score Accuracy Precision Recall F1 Precision Recall F1 Accuracy
Llama 3 70B Instruct 0.123 0.290 0.272 0.901 0.904 0.901 0.902 0.652 0.573 0.535 0.573
Llama 3 8B Instruct 0.161 0.344 0.045 0.738 0.801 0.738 0.698 0.635 0.625 0.600 0.625
DBRX Instruct 0.160 0.321 0.052 0.524 0.727 0.524 0.499 0.654 0.541 0.436 0.541
DeepSeek LLM (67B) 0.118 0.278 0.302 0.815 0.867 0.815 0.811 0.676 0.544 0.462 0.544
Gemma 2 27B 0.100 0.266 0.406 0.890 0.896 0.890 0.884 0.562 0.524 0.515 0.524
Gemma 2 9B 0.189 0.352 -0.120 0.940 0.941 0.940 0.940 0.570 0.499 0.491 0.499
Mistral (7B) Instruct v0.3 0.135 0.278 0.200 0.847 0.854 0.847 0.841 0.607 0.542 0.522 0.542
Mixtral-8x22B Instruct 0.221 0.364 -0.310 0.768 0.845 0.768 0.776 0.614 0.538 0.510 0.538
Mixtral-8x7B Instruct 0.208 0.307 -0.229 0.896 0.898 0.896 0.893 0.611 0.518 0.498 0.518
Qwen 2 Instruct (72B) 0.205 0.409 -0.212 0.904 0.908 0.904 0.901 0.644 0.601 0.576 0.601
WizardLM-2 8x22B 0.129 0.283 0.239 0.765 0.853 0.765 0.779 0.611 0.570 0.566 0.570
DeepSeek-V3 0.150 0.311 0.111 0.828 0.851 0.828 0.814 0.640 0.572 0.583 0.572
DeepSeek R1 0.110 0.289 0.348 0.904 0.907 0.904 0.902 0.644 0.489 0.499 0.489
QwQ-32B-Preview 0.141 0.290 0.165 0.812 0.827 0.812 0.815 0.629 0.534 0.550 0.534
Jamba 1.5 Mini 0.119 0.282 0.293 0.784 0.814 0.784 0.765 0.380 0.525 0.418 0.525
Jamba 1.5 Large 0.183 0.363 -0.085 0.824 0.850 0.824 0.798 0.635 0.573 0.582 0.573
Claude 3.5 Sonnet 0.101 0.268 0.402 0.944 0.945 0.944 0.944 0.634 0.585 0.553 0.585
Claude 3 Haiku 0.167 0.349 0.008 0.907 0.913 0.907 0.908 0.619 0.538 0.463 0.538
Cohere Command R 7B 0.164 0.319 0.028 0.835 0.861 0.835 0.840 0.609 0.547 0.532 0.547
Cohere Command R + 0.106 0.274 0.373 0.741 0.806 0.741 0.699 0.608 0.547 0.533 0.547
Google Gemini 1.5 Pro 0.144 0.329 0.149 0.890 0.895 0.890 0.885 0.642 0.587 0.593 0.587
OpenAI gpt-4o 0.184 0.317 -0.089 0.929 0.931 0.929 0.928 0.639 0.515 0.541 0.515
OpenAI o1-mini 0.120 0.295 0.289 0.918 0.917 0.918 0.917 0.660 0.515 0.542 0.515

Note: Color highlighting indicates performance ranking:  Best ,  Strong ,  Good