noneUsername/Mistral-Small-3.2-24B-Instruct-hf-W8A8

vllm (pretrained=/root/autodl-tmp/Mistral-Small-3.2-24B-Instruct-hf,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.908	±	0.0183
		strict-match	5	exact_match	↑	0.904	±	0.0187

vllm (pretrained=/root/autodl-tmp/Mistral-Small-3.2-24B-Instruct-hf,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.8), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.908	±	0.0129
		strict-match	5	exact_match	↑	0.902	±	0.0133

vllm (pretrained=/root/autodl-tmp/Mistral-Small-3.2-24B-Instruct-hf,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.9), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.8035	±	0.0129
- humanities	2	none	acc	↑	0.8462	±	0.0247
- other	2	none	acc	↑	0.8256	±	0.0262
- social sciences	2	none	acc	↑	0.8389	±	0.0271
- stem	2	none	acc	↑	0.7368	±	0.0246

vllm (pretrained=/root/autodl-tmp/root90-128-4096-9.9999,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.900	±	0.0190
		strict-match	5	exact_match	↑	0.896	±	0.0193

vllm (pretrained=/root/autodl-tmp/root90-128-4096-9.9999,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.5), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.892	±	0.0139
		strict-match	5	exact_match	↑	0.886	±	0.0142

vllm (pretrained=/root/autodl-tmp/root90-256-4096-9.9999,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.5), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.916	±	0.0176
		strict-match	5	exact_match	↑	0.908	±	0.0183

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.904	±	0.0132
		strict-match	5	exact_match	↑	0.898	±	0.0135

vllm (pretrained=/root/autodl-tmp/root90-256-4096-9.9999,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.9), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7895	±	0.0132
- humanities	2	none	acc	↑	0.8256	±	0.0251
- other	2	none	acc	↑	0.8051	±	0.0273
- social sciences	2	none	acc	↑	0.7889	±	0.0292
- stem	2	none	acc	↑	0.7544	±	0.0241

noneUsername
/

Mistral-Small-3.2-24B-Instruct-hf-W8A8

Model tree for noneUsername/Mistral-Small-3.2-24B-Instruct-hf-W8A8