Model | ECTSum | EDTSum | ||||
---|---|---|---|---|---|---|
BERTScore Precision | BERTScore Recall | BERTScore F1 | BERTScore Precision | BERTScore Recall | BERTScore F1 | |
Llama 3 70B Instruct | 0.715 | 0.801 | 0.754 | 0.793 | 0.844 | 0.817 |
Llama 3 8B Instruct | 0.724 | 0.796 | 0.757 | 0.785 | 0.841 | 0.811 |
DBRX Instruct | 0.680 | 0.786 | 0.729 | 0.774 | 0.843 | 0.806 |
DeepSeek LLM (67B) | 0.692 | 0.678 | 0.681 | 0.779 | 0.840 | 0.807 |
Gemma 2 27B | 0.680 | 0.777 | 0.723 | 0.801 | 0.829 | 0.814 |
Gemma 2 9B | 0.651 | 0.531 | 0.585 | 0.803 | 0.833 | 0.817 |
Mistral (7B) Instruct v0.3 | 0.702 | 0.806 | 0.750 | 0.783 | 0.842 | 0.811 |
Mixtral-8x22B Instruct | 0.713 | 0.812 | 0.758 | 0.790 | 0.843 | 0.815 |
Mixtral-8x7B Instruct | 0.727 | 0.773 | 0.747 | 0.785 | 0.839 | 0.810 |
Qwen 2 Instruct (72B) | 0.709 | 0.804 | 0.752 | 0.781 | 0.846 | 0.811 |
WizardLM-2 8x22B | 0.677 | 0.806 | 0.735 | 0.774 | 0.847 | 0.808 |
DeepSeek-V3 | 0.703 | 0.806 | 0.750 | 0.791 | 0.842 | 0.815 |
DeepSeek R1 | 0.724 | 0.800 | 0.759 | 0.770 | 0.843 | 0.804 |
QwQ-32B-Preview | 0.653 | 0.751 | 0.696 | 0.797 | 0.841 | 0.817 |
Jamba 1.5 Mini | 0.692 | 0.798 | 0.741 | 0.798 | 0.838 | 0.816 |
Jamba 1.5 Large | 0.679 | 0.800 | 0.734 | 0.799 | 0.841 | 0.818 |
Claude 3.5 Sonnet | 0.737 | 0.802 | 0.767 | 0.786 | 0.843 | 0.813 |
Claude 3 Haiku | 0.683 | 0.617 | 0.646 | 0.778 | 0.844 | 0.808 |
Cohere Command R 7B | 0.724 | 0.781 | 0.750 | 0.790 | 0.844 | 0.815 |
Cohere Command R + | 0.724 | 0.782 | 0.751 | 0.789 | 0.834 | 0.810 |
Google Gemini 1.5 Pro | 0.757 | 0.800 | 0.777 | 0.800 | 0.836 | 0.817 |
OpenAI gpt-4o | 0.755 | 0.793 | 0.773 | 0.795 | 0.840 | 0.816 |
OpenAI o1-mini | 0.731 | 0.801 | 0.763 | 0.795 | 0.840 | 0.816 |
Note: Color highlighting indicates performance ranking: Best , Strong , Good