Update README.md
Browse files
README.md
CHANGED
@@ -76,4 +76,18 @@ This document presents the evaluation results of `DeepSeek-R1-Distill-Llama-70B`
|
|
76 |
|
77 |
---
|
78 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
79 |
📌 Let us know if you need further analysis or model tuning! 🚀
|
|
|
76 |
|
77 |
---
|
78 |
|
79 |
+
|
80 |
+
## 📊 Detailed Evaluation on MMLU Challenges
|
81 |
+
|
82 |
+
|
83 |
+
| **Metric** | **Value** | **Description** |
|
84 |
+
|----------------------|-----------|-----------------|
|
85 |
+
| **MMLU** | `37.88%` | Raw Averaged over MMLU-Stem, MMLU-Social-Sciences, MMLU-Humanities, MMLU-ther |
|
86 |
+
| **MMLU-Humanities** | `31.83%` | Averaged over MMLU-Formal-Logic, MMLU-Prehistory, MMLU-World-Religions, MMLU-Philosophy, MMLU-High-School-World-History, MMLU-Professional-Law, MMLU-High-School-US-History, MMLU-Logical-Fallacies, MMLU-International-Law, MMLU-High-School-European-History, MMLU-Moral-Disputes, MMLU-Moral-Scenarios, MMLU-Jurisprudence |
|
87 |
+
| **MMLU-Social-Sciences** | `45.43%` | Averaged over MMLU-Public-Relations, MMLU-Sociology, MMLU-Security-Studies, MMLU-High-School-Government-and-Politics, MMLU-High-School-Psychology, MMLU-Human-Sexuality, MMLU-US-Foreign-Policy, MMLU-High-School-Microeconomics, MMLU-Econometrics, MMLU-High-School-Macroeconomics, MMLU-High-School-Geography, MMLU-Professional-Psychology |
|
88 |
+
| **MMLU-Stem** | `33.01%` | Averaged over MMLU-Conceptual-Physics, MMLU-High-School-Chemistry, MMLU-College-Biology, MMLU-College-Chemistry, MMLU-Machine-Learning, MMLU-Elementary-Mathematics, MMLU-Abstract-Algebra, MMLU-Astronomy, MMLU-High-School-Statistics, MMLU-Anatomy, MMLU-College-Mathematics, MMLU-Computer-Security, MMLU-College-Computer-Science, MMLU-Electrical-Engineering, MMLU-College-Physics, MMLU-High-School-Computer-Science, MMLU-High-School-Physics, MMLU-High-School-Biology, MMLU-High-School-Mathematics |
|
89 |
+
| **MMLU-Other** | `44.48%` | Averaged over MMLU-Medical-Genetics, MMLU-Global-Facts, MMLU-Marketing, MMLU-College-Medicine, MMLU-Human-Aging, MMLU-Virology, MMLU-Business-Ethics, MMLU-Clinical-Knowledge, MMLU-Professional-Medicine, MMLU-Nutrition, MMLU-Miscellaneous, MMLU-Professional-Accounting, MMLU-Management |
|
90 |
+
|
91 |
+
|
92 |
+
|
93 |
📌 Let us know if you need further analysis or model tuning! 🚀
|