noneUsername/Apriel-Nemotron-15b-Thinker-W8A8

vllm (pretrained=/root/autodl-tmp/Apriel-Nemotron-15b-Thinker,add_bos_token=true,max_model_len=8096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.808	±	0.0250
		strict-match	5	exact_match	↑	0.796	±	0.0255

vllm (pretrained=/root/autodl-tmp/Apriel-Nemotron-15b-Thinker,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.798	±	0.0180
		strict-match	5	exact_match	↑	0.804	±	0.0178

vllm (pretrained=/root/autodl-tmp/Apriel-Nemotron-15b-Thinker,add_bos_token=true,max_model_len=3048,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.6959	±	0.0150
- humanities	2	none	acc	↑	0.7231	±	0.0299
- other	2	none	acc	↑	0.7026	±	0.0318
- social sciences	2	none	acc	↑	0.7722	±	0.0300
- stem	2	none	acc	↑	0.6246	±	0.0278

vllm (pretrained=/root/autodl-tmp/80-128,add_bos_token=true,max_model_len=8096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.812	±	0.0248
		strict-match	5	exact_match	↑	0.796	±	0.0255

vllm (pretrained=/root/autodl-tmp/80-128,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.802	±	0.0178
		strict-match	5	exact_match	↑	0.798	±	0.0180

vllm (pretrained=/root/autodl-tmp/80-128,add_bos_token=true,max_model_len=3048,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.6959	±	0.0151
- humanities	2	none	acc	↑	0.7128	±	0.0303
- other	2	none	acc	↑	0.7077	±	0.0318
- social sciences	2	none	acc	↑	0.7500	±	0.0312
- stem	2	none	acc	↑	0.6421	±	0.0273

vllm (pretrained=/root/autodl-tmp/80-256,add_bos_token=true,max_model_len=8096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.848	±	0.0228
		strict-match	5	exact_match	↑	0.820	±	0.0243

vllm (pretrained=/root/autodl-tmp/80-256,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.838	±	0.0165
		strict-match	5	exact_match	↑	0.828	±	0.0169

vllm (pretrained=/root/autodl-tmp/80-256,add_bos_token=true,max_model_len=3048,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7018	±	0.0149
- humanities	2	none	acc	↑	0.7333	±	0.0296
- other	2	none	acc	↑	0.6923	±	0.0322
- social sciences	2	none	acc	↑	0.7556	±	0.0304
- stem	2	none	acc	↑	0.6526	±	0.0271

noneUsername
/

Apriel-Nemotron-15b-Thinker-W8A8

Model tree for noneUsername/Apriel-Nemotron-15b-Thinker-W8A8