kazukifujii commited on
Commit
9461818
·
verified ·
1 Parent(s): ce8b390

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -12
README.md CHANGED
@@ -84,18 +84,6 @@ Details are in the paper’s Appendix.
84
  ## Evaluation
85
  The model was evaluated using the setup described in the SwallowCode paper, with the lm-evaluation-harness and BigCodeBench. Benchmarks include code generation (HumanEval, HumanEval+) and general tasks (OpenBookQA, TriviaQA, HellaSwag, SQuAD 2.0, XWINO, MMLU, GSM8K, BBH). Results are reported for checkpoints at 10B, 20B, 30B, 40B, and 50B tokens.
86
 
87
- Evaluation Results (Experiment 2)
88
-
89
- | Tokens (B) | OpenBookQA | TriviaQA | HellaSwag | SQuAD2.0 | XWINO | MMLU | GSM8K | BBH | HumanEval | HumanEval+ |
90
- |------------|------------|----------|-----------|----------|-------|--------|--------|--------|-----------|------------|
91
- | 10 | 0.3560 | 0.6675 | 0.6015 | 0.3385 | 0.9062| 0.6321 | 0.4784 | 0.5881 | 0.3604 | 0.3713 |
92
- | 20 | 0.3520 | 0.6635 | 0.6026 | 0.3364 | 0.9049| 0.6252 | 0.4784 | 0.5781 | 0.3591 | 0.3585 |
93
- | 30 | 0.3560 | 0.6637 | 0.6012 | 0.3375 | 0.9080| 0.6313 | 0.5019 | 0.5950 | 0.3701 | 0.3762 |
94
- | 40 | 0.3580 | 0.6679 | 0.6046 | 0.3346 | 0.9062| 0.6330 | 0.5019 | 0.5998 | 0.3720 | 0.3689 |
95
- | 50 | 0.3660 | 0.6694 | 0.6055 | 0.3340 | 0.9084| 0.6325 | 0.5155 | 0.6044 | 0.3787 | 0.3787 |
96
-
97
- *Source: Table 3 from the SwallowCode paper, showing performance of the syntax-error-free Python subset.*
98
-
99
 
100
  ## Citation
101
 
 
84
  ## Evaluation
85
  The model was evaluated using the setup described in the SwallowCode paper, with the lm-evaluation-harness and BigCodeBench. Benchmarks include code generation (HumanEval, HumanEval+) and general tasks (OpenBookQA, TriviaQA, HellaSwag, SQuAD 2.0, XWINO, MMLU, GSM8K, BBH). Results are reported for checkpoints at 10B, 20B, 30B, 40B, and 50B tokens.
86
 
 
 
 
 
 
 
 
 
 
 
 
 
87
 
88
  ## Citation
89