Text Summarization Task Results

Model ECTSum EDTSum
BERTScore Precision BERTScore Recall BERTScore F1 BERTScore Precision BERTScore Recall BERTScore F1
Llama 3 70B Instruct 0.715 0.801 0.754 0.793 0.844 0.817
Llama 3 8B Instruct 0.724 0.796 0.757 0.785 0.841 0.811
DBRX Instruct 0.680 0.786 0.729 0.774 0.843 0.806
DeepSeek LLM (67B) 0.692 0.678 0.681 0.779 0.840 0.807
Gemma 2 27B 0.680 0.777 0.723 0.801 0.829 0.814
Gemma 2 9B 0.651 0.531 0.585 0.803 0.833 0.817
Mistral (7B) Instruct v0.3 0.702 0.806 0.750 0.783 0.842 0.811
Mixtral-8x22B Instruct 0.713 0.812 0.758 0.790 0.843 0.815
Mixtral-8x7B Instruct 0.727 0.773 0.747 0.785 0.839 0.810
Qwen 2 Instruct (72B) 0.709 0.804 0.752 0.781 0.846 0.811
WizardLM-2 8x22B 0.677 0.806 0.735 0.774 0.847 0.808
DeepSeek-V3 0.703 0.806 0.750 0.791 0.842 0.815
DeepSeek R1 0.724 0.800 0.759 0.770 0.843 0.804
QwQ-32B-Preview 0.653 0.751 0.696 0.797 0.841 0.817
Jamba 1.5 Mini 0.692 0.798 0.741 0.798 0.838 0.816
Jamba 1.5 Large 0.679 0.800 0.734 0.799 0.841 0.818
Claude 3.5 Sonnet 0.737 0.802 0.767 0.786 0.843 0.813
Claude 3 Haiku 0.683 0.617 0.646 0.778 0.844 0.808
Cohere Command R 7B 0.724 0.781 0.750 0.790 0.844 0.815
Cohere Command R + 0.724 0.782 0.751 0.789 0.834 0.810
Google Gemini 1.5 Pro 0.757 0.800 0.777 0.800 0.836 0.817
OpenAI gpt-4o 0.755 0.793 0.773 0.795 0.840 0.816
OpenAI o1-mini 0.731 0.801 0.763 0.795 0.840 0.816

Note: Color highlighting indicates performance ranking:  Best ,  Strong ,  Good