tokyotech-llm
/

Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0012500

Model card Files Files and versions Community

kazukifujii commited on Jul 4

Commit

7c07587

·

verified ·

1 Parent(s): 0a591e4

Update README.md

Files changed (1) hide show

README.md +0 -10

README.md CHANGED Viewed

@@ -80,16 +80,6 @@ Details are in the paper’s Appendix.
 ## Evaluation
 The model was evaluated using the setup described in the SwallowCode paper, with the lm-evaluation-harness and BigCodeBench. Benchmarks include code generation (HumanEval, HumanEval+) and general tasks (OpenBookQA, TriviaQA, HellaSwag, SQuAD 2.0, XWINO, MMLU, GSM8K, BBH). Results are reported for checkpoints at 10B, 20B, 30B, 40B, and 50B tokens.
-Evaluation Results (Experiment 1)
-|Tokens (B)	| OpenBookQA	|TriviaQA|	HellaSwag|	SQuAD2.0|	XWINO|	MMLU|	GSM8K|	BBH|	HumanEval|	HumanEval+|
-|---|---|---|---|---|---|---|---|---|---|---|
-|10	|0.3640|	0.6659|	0.5995|	0.3354|	0.9032|	0.6294|	0.4602|	0.6019|	0.3366|	0.3366|
-|20	|0.3540|	0.6567|	0.6019|	0.3360|	0.9024|	0.6238|	0.4852| 0.5898|	0.3433|	0.3433|
-|30	|0.3700|	0.6588|	0.6034|	0.3377|	0.9045|	0.6263|	0.5072|	0.5939|	0.3402|	0.3421|
-|40	|0.3800|	0.6618|	0.6053|	0.3380|	0.9097|	0.6341|	0.5011|	0.6016|	0.3659|	0.3701|
-|50	|0.3700|	0.6679|	0.6054|	0.3350|	0.9045|	0.6340|	0.5027|	0.6091|	0.3689|	0.3720|
 ## Citation
 ```bibtex

 ## Evaluation
 The model was evaluated using the setup described in the SwallowCode paper, with the lm-evaluation-harness and BigCodeBench. Benchmarks include code generation (HumanEval, HumanEval+) and general tasks (OpenBookQA, TriviaQA, HellaSwag, SQuAD 2.0, XWINO, MMLU, GSM8K, BBH). Results are reported for checkpoints at 10B, 20B, 30B, 40B, and 50B tokens.
 ## Citation
 ```bibtex