kazukifujii commited on
Commit
7c07587
·
verified ·
1 Parent(s): 0a591e4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -10
README.md CHANGED
@@ -80,16 +80,6 @@ Details are in the paper’s Appendix.
80
  ## Evaluation
81
  The model was evaluated using the setup described in the SwallowCode paper, with the lm-evaluation-harness and BigCodeBench. Benchmarks include code generation (HumanEval, HumanEval+) and general tasks (OpenBookQA, TriviaQA, HellaSwag, SQuAD 2.0, XWINO, MMLU, GSM8K, BBH). Results are reported for checkpoints at 10B, 20B, 30B, 40B, and 50B tokens.
82
 
83
- Evaluation Results (Experiment 1)
84
-
85
- |Tokens (B) | OpenBookQA |TriviaQA| HellaSwag| SQuAD2.0| XWINO| MMLU| GSM8K| BBH| HumanEval| HumanEval+|
86
- |---|---|---|---|---|---|---|---|---|---|---|
87
- |10 |0.3640| 0.6659| 0.5995| 0.3354| 0.9032| 0.6294| 0.4602| 0.6019| 0.3366| 0.3366|
88
- |20 |0.3540| 0.6567| 0.6019| 0.3360| 0.9024| 0.6238| 0.4852| 0.5898| 0.3433| 0.3433|
89
- |30 |0.3700| 0.6588| 0.6034| 0.3377| 0.9045| 0.6263| 0.5072| 0.5939| 0.3402| 0.3421|
90
- |40 |0.3800| 0.6618| 0.6053| 0.3380| 0.9097| 0.6341| 0.5011| 0.6016| 0.3659| 0.3701|
91
- |50 |0.3700| 0.6679| 0.6054| 0.3350| 0.9045| 0.6340| 0.5027| 0.6091| 0.3689| 0.3720|
92
-
93
  ## Citation
94
 
95
  ```bibtex
 
80
  ## Evaluation
81
  The model was evaluated using the setup described in the SwallowCode paper, with the lm-evaluation-harness and BigCodeBench. Benchmarks include code generation (HumanEval, HumanEval+) and general tasks (OpenBookQA, TriviaQA, HellaSwag, SQuAD 2.0, XWINO, MMLU, GSM8K, BBH). Results are reported for checkpoints at 10B, 20B, 30B, 40B, and 50B tokens.
82
 
 
 
 
 
 
 
 
 
 
 
83
  ## Citation
84
 
85
  ```bibtex