noneUsername/Cydonia-24B-v3-W8A8

vllm (pretrained=/root/autodl-tmp/Cydonia-24B-v3,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.908	±	0.0183
		strict-match	5	exact_match	↑	0.900	±	0.0190

vllm (pretrained=/root/autodl-tmp/Cydonia-24B-v3,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.892	±	0.0139
		strict-match	5	exact_match	↑	0.880	±	0.0145

vllm (pretrained=/root/autodl-tmp/Cydonia-24B-v3,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.8047	±	0.0129
- humanities	2	none	acc	↑	0.8513	±	0.0247
- other	2	none	acc	↑	0.8308	±	0.0264
- social sciences	2	none	acc	↑	0.8500	±	0.0251
- stem	2	none	acc	↑	0.7263	±	0.0251

vllm (pretrained=/root/autodl-tmp/rootCydonia-24B-v3-86-512-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.848	±	0.0228
		strict-match	5	exact_match	↑	0.832	±	0.0237

vllm (pretrained=/root/autodl-tmp/rootCydonia-24B-v3-90-128-3096-9.999,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.900	±	0.0190
		strict-match	5	exact_match	↑	0.892	±	0.0197

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.896	±	0.0137
		strict-match	5	exact_match	↑	0.886	±	0.0142

vllm (pretrained=/root/autodl-tmp/rootCydonia-24B-v3-90-128-3096-9.999,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7942	±	0.0131
- humanities	2	none	acc	↑	0.8359	±	0.0252
- other	2	none	acc	↑	0.8256	±	0.0266
- social sciences	2	none	acc	↑	0.8389	±	0.0266
- stem	2	none	acc	↑	0.7158	±	0.0250

vllm (pretrained=/root/autodl-tmp/Cydonia-24B-v3-AWQ,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.908	±	0.0183
		strict-match	5	exact_match	↑	0.888	±	0.0200

vllm (pretrained=/root/autodl-tmp/Cydonia-24B-v3-AWQ,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.892	±	0.0139
		strict-match	5	exact_match	↑	0.862	±	0.0154

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7942	±	0.0131
- humanities	2	none	acc	↑	0.8359	±	0.0252
- other	2	none	acc	↑	0.8256	±	0.0266
- social sciences	2	none	acc	↑	0.8389	±	0.0266
- stem	2	none	acc	↑	0.7158	±	0.0250

noneUsername
/

Cydonia-24B-v3-W8A8

Model tree for noneUsername/Cydonia-24B-v3-W8A8