Text Generation
Transformers
PyTorch
English
llama
Eval Results
text-generation-inference
Inference Endpoints
Pankaj Mathur commited on
Commit
c101123
1 Parent(s): 50686f1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -22
README.md CHANGED
@@ -20,30 +20,16 @@ Please note this model has *better code generation capabilities* compare to our
20
 
21
  I evaluated orca_mini_v2_7b on a wide range of tasks using [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) from EleutherAI.
22
 
23
- Here are the zero shot metrics results.
24
-
25
- |||||||
26
- |:------:|:-------------:|:---------:|:--------:|:-------:|:--------:|
27
- |**Task**|**num_fewshot**|**Version**|**Metric**|**Value**|**Stderr**|
28
- |*arc_easy*|0|0|acc|0.7386|0.0090|
29
- |*hellaswag*|0|0|acc_norm|0.7394|0.0044|
30
- |*truthfulqa_mc*|0|1|mc2|0.4399|0.0153|
31
- |*mmlu*|0|1|acc_norm|0.4108|0.0153|
32
- |*Total Zero Shot Average*|0|-|-|0.5821|0.011|
33
-
34
-
35
  Here are the results on metrics used by [HuggingFaceH4 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
36
 
37
- please note num_fewshots varies for each below task as used by HuggingFaceH4 Open LLM Leaderboard
38
-
39
- |||||||
40
- |:------:|:-------------:|:---------:|:--------:|:-------:|:--------:|
41
- |**Task**|**num_fewshot**|**Version**|**Metric**|**Value**|**Stderr**|
42
- |*arc_challenge*|25|0|acc_norm|0.5077|0.0146|
43
- |*hellaswag*|10|0|acc_norm|0.7617|0.0043|
44
- |*mmlu*|5|0|acc_norm|0.3955|0.035|
45
- |*truthfulqa_mc*|0|1|mc2|0.4399|0.0153|
46
- |*Total Average*|0|-|-|0.5262|0.0173|
47
 
48
 
49
 
 
20
 
21
  I evaluated orca_mini_v2_7b on a wide range of tasks using [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) from EleutherAI.
22
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  Here are the results on metrics used by [HuggingFaceH4 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
24
 
25
+ ||||||
26
+ |:------:|:--------:|:-------:|:--------:|
27
+ |**Task**|**Metric**|**Value**|**Stderr**|
28
+ |*arc_challenge*|acc_norm|0.5077|0.0146|
29
+ |*hellaswag*|acc_norm|0.7617|0.0043|
30
+ |*mmlu*|acc_norm|0.3955|0.035|
31
+ |*truthfulqa_mc*|mc2|0.4399|0.0153|
32
+ |*Total Average*|-|0.5262|0.0173|
 
 
33
 
34
 
35