Pankaj Mathur
commited on
Commit
•
c101123
1
Parent(s):
50686f1
Update README.md
Browse files
README.md
CHANGED
@@ -20,30 +20,16 @@ Please note this model has *better code generation capabilities* compare to our
|
|
20 |
|
21 |
I evaluated orca_mini_v2_7b on a wide range of tasks using [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) from EleutherAI.
|
22 |
|
23 |
-
Here are the zero shot metrics results.
|
24 |
-
|
25 |
-
|||||||
|
26 |
-
|:------:|:-------------:|:---------:|:--------:|:-------:|:--------:|
|
27 |
-
|**Task**|**num_fewshot**|**Version**|**Metric**|**Value**|**Stderr**|
|
28 |
-
|*arc_easy*|0|0|acc|0.7386|0.0090|
|
29 |
-
|*hellaswag*|0|0|acc_norm|0.7394|0.0044|
|
30 |
-
|*truthfulqa_mc*|0|1|mc2|0.4399|0.0153|
|
31 |
-
|*mmlu*|0|1|acc_norm|0.4108|0.0153|
|
32 |
-
|*Total Zero Shot Average*|0|-|-|0.5821|0.011|
|
33 |
-
|
34 |
-
|
35 |
Here are the results on metrics used by [HuggingFaceH4 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
36 |
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|*
|
43 |
-
|*
|
44 |
-
|*
|
45 |
-
|*truthfulqa_mc*|0|1|mc2|0.4399|0.0153|
|
46 |
-
|*Total Average*|0|-|-|0.5262|0.0173|
|
47 |
|
48 |
|
49 |
|
|
|
20 |
|
21 |
I evaluated orca_mini_v2_7b on a wide range of tasks using [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) from EleutherAI.
|
22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
Here are the results on metrics used by [HuggingFaceH4 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
24 |
|
25 |
+
||||||
|
26 |
+
|:------:|:--------:|:-------:|:--------:|
|
27 |
+
|**Task**|**Metric**|**Value**|**Stderr**|
|
28 |
+
|*arc_challenge*|acc_norm|0.5077|0.0146|
|
29 |
+
|*hellaswag*|acc_norm|0.7617|0.0043|
|
30 |
+
|*mmlu*|acc_norm|0.3955|0.035|
|
31 |
+
|*truthfulqa_mc*|mc2|0.4399|0.0153|
|
32 |
+
|*Total Average*|-|0.5262|0.0173|
|
|
|
|
|
33 |
|
34 |
|
35 |
|