kazukifujii commited on
Commit
a0980e8
·
verified ·
1 Parent(s): 9897b2c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -14
README.md CHANGED
@@ -86,20 +86,6 @@ Details are in the paper’s Appendix.
86
  ## Evaluation
87
  The model was evaluated using the setup described in the SwallowCode paper, with the lm-evaluation-harness and BigCodeBench. Benchmarks include code generation (HumanEval, HumanEval+) and general tasks (OpenBookQA, TriviaQA, HellaSwag, SQuAD 2.0, XWINO, MMLU, GSM8K, BBH). Results are reported for checkpoints at 10B, 20B, 30B, 40B, and 50B tokens.
88
 
89
- Evaluation Results (Experiment 3)
90
-
91
- ### Evaluation Results (Experiment 3)
92
-
93
- | Tokens (B) | OpenBookQA | TriviaQA | HellaSwag | SQuAD2.0 | XWINO | MMLU | GSM8K | BBH | HumanEval | HumanEval+ |
94
- |------------|------------|----------|-----------|----------|-------|--------|--------|--------|-----------|------------|
95
- | 10 | 0.3560 | 0.6628 | 0.6010 | 0.3340 | 0.9071| 0.6235 | 0.4564 | 0.6007 | 0.3500 | 0.3488 |
96
- | 20 | 0.3500 | 0.6613 | 0.6015 | 0.3361 | 0.9054| 0.6237 | 0.4860 | 0.5838 | 0.3744 | 0.3787 |
97
- | 30 | 0.3620 | 0.6596 | 0.6008 | 0.3359 | 0.9080| 0.6307 | 0.4867 | 0.5921 | 0.3957 | 0.3878 |
98
- | 40 | 0.3720 | 0.6650 | 0.6030 | 0.3352 | 0.9058| 0.6326 | 0.4822 | 0.5990 | 0.3890 | 0.3915 |
99
- | 50 | 0.3740 | 0.6677 | 0.6054 | 0.3291 | 0.9019| 0.6327 | 0.4996 | 0.6145 | 0.3945 | 0.3902 |
100
-
101
- *Source: Table 4 from the SwallowCode paper, showing performance of the syntax-error and Pylint-filtered (score ≥ 7) Python subset.*
102
-
103
 
104
  ## Citation
105
 
 
86
  ## Evaluation
87
  The model was evaluated using the setup described in the SwallowCode paper, with the lm-evaluation-harness and BigCodeBench. Benchmarks include code generation (HumanEval, HumanEval+) and general tasks (OpenBookQA, TriviaQA, HellaSwag, SQuAD 2.0, XWINO, MMLU, GSM8K, BBH). Results are reported for checkpoints at 10B, 20B, 30B, 40B, and 50B tokens.
88
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89
 
90
  ## Citation
91