Update README.md
Browse files
README.md
CHANGED
@@ -101,23 +101,20 @@ This is a test project for merging models.
|
|
101 |
|
102 |
# Open LLM Leaderboard Evaluation Results
|
103 |
|
104 |
-
Detailed results can be found here.
|
105 |
|
106 |
| Metric | Value |
|
107 |
|-----------------------|---------------------------|
|
108 |
-
| Avg. |
|
109 |
-
| ARC (25-shot) |
|
110 |
-
| HellaSwag (10-shot) |
|
111 |
-
| MMLU (5-shot) |
|
112 |
-
| TruthfulQA (0-shot) |
|
113 |
-
| Winogrande (5-shot) |
|
114 |
-
| GSM8K (5-shot) |
|
115 |
|
116 |
# Acknowlegement
|
117 |
-
- [mergekit](https://github.com/cg123/mergekit
|
118 |
-
)
|
119 |
- [DARE](https://github.com/yule-BUAA/MergeLM/blob/main/README.md)
|
120 |
-
-
|
121 |
-
[SLERP](https://github.com/Digitous/LLM-SLERP-Merge)
|
122 |
-
|
123 |
- [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
|
|
|
101 |
|
102 |
# Open LLM Leaderboard Evaluation Results
|
103 |
|
104 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_jan-hq__trinity-v1).
|
105 |
|
106 |
| Metric | Value |
|
107 |
|-----------------------|---------------------------|
|
108 |
+
| Avg. | 74.8|
|
109 |
+
| ARC (25-shot) | 72.27 |
|
110 |
+
| HellaSwag (10-shot) | 88.36 |
|
111 |
+
| MMLU (5-shot) | 65.2|
|
112 |
+
| TruthfulQA (0-shot) | 69.31 |
|
113 |
+
| Winogrande (5-shot) | 82 |
|
114 |
+
| GSM8K (5-shot) | 71.65 |
|
115 |
|
116 |
# Acknowlegement
|
117 |
+
- [mergekit](https://github.com/cg123/mergekit)
|
|
|
118 |
- [DARE](https://github.com/yule-BUAA/MergeLM/blob/main/README.md)
|
119 |
+
- [SLERP](https://github.com/Digitous/LLM-SLERP-Merge)
|
|
|
|
|
120 |
- [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
|