jan-hq
/

trinity-v1

@@ -101,23 +101,20 @@ This is a test project for merging models.
 # Open LLM Leaderboard Evaluation Results
-Detailed results can be found here.
 | Metric                | Value                     |
 |-----------------------|---------------------------|
-| Avg.                  | ?|
-| ARC (25-shot)         | ?         |
-| HellaSwag (10-shot)   | ?   |
-| MMLU (5-shot)         | ?|
-| TruthfulQA (0-shot)   | ? |
-| Winogrande (5-shot)   | ?  |
-| GSM8K (5-shot)        | ?        |
 # Acknowlegement
-- [mergekit](https://github.com/cg123/mergekit
-)
 - [DARE](https://github.com/yule-BUAA/MergeLM/blob/main/README.md)
--
- [SLERP](https://github.com/Digitous/LLM-SLERP-Merge)
 - [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)

 # Open LLM Leaderboard Evaluation Results
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_jan-hq__trinity-v1).
 | Metric                | Value                     |
 |-----------------------|---------------------------|
+| Avg.                  | 74.8|
+| ARC (25-shot)         | 72.27         |
+| HellaSwag (10-shot)   | 88.36   |
+| MMLU (5-shot)         | 65.2|
+| TruthfulQA (0-shot)   | 69.31 |
+| Winogrande (5-shot)   | 82  |
+| GSM8K (5-shot)        | 71.65        |
 # Acknowlegement
+- [mergekit](https://github.com/cg123/mergekit)
 - [DARE](https://github.com/yule-BUAA/MergeLM/blob/main/README.md)
+- [SLERP](https://github.com/Digitous/LLM-SLERP-Merge)
 - [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)