vllm (pretrained=/root/autodl-tmp/Cydonia-24B-v3,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.908 | ± | 0.0183 |
strict-match | 5 | exact_match | ↑ | 0.900 | ± | 0.0190 |
vllm (pretrained=/root/autodl-tmp/Cydonia-24B-v3,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.892 | ± | 0.0139 |
strict-match | 5 | exact_match | ↑ | 0.880 | ± | 0.0145 |
vllm (pretrained=/root/autodl-tmp/Cydonia-24B-v3,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | ↑ | 0.8047 | ± | 0.0129 | |
- humanities | 2 | none | acc | ↑ | 0.8513 | ± | 0.0247 | |
- other | 2 | none | acc | ↑ | 0.8308 | ± | 0.0264 | |
- social sciences | 2 | none | acc | ↑ | 0.8500 | ± | 0.0251 | |
- stem | 2 | none | acc | ↑ | 0.7263 | ± | 0.0251 |
vllm (pretrained=/root/autodl-tmp/rootCydonia-24B-v3-86-512-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.848 | ± | 0.0228 |
strict-match | 5 | exact_match | ↑ | 0.832 | ± | 0.0237 |
vllm (pretrained=/root/autodl-tmp/rootCydonia-24B-v3-90-128-3096-9.999,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.900 | ± | 0.0190 |
strict-match | 5 | exact_match | ↑ | 0.892 | ± | 0.0197 |
vllm (pretrained=/root/autodl-tmp/rootCydonia-24B-v3-90-128-3096-9.999,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.896 | ± | 0.0137 |
strict-match | 5 | exact_match | ↑ | 0.886 | ± | 0.0142 |
vllm (pretrained=/root/autodl-tmp/rootCydonia-24B-v3-90-128-3096-9.999,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | ↑ | 0.7942 | ± | 0.0131 | |
- humanities | 2 | none | acc | ↑ | 0.8359 | ± | 0.0252 | |
- other | 2 | none | acc | ↑ | 0.8256 | ± | 0.0266 | |
- social sciences | 2 | none | acc | ↑ | 0.8389 | ± | 0.0266 | |
- stem | 2 | none | acc | ↑ | 0.7158 | ± | 0.0250 |
vllm (pretrained=/root/autodl-tmp/Cydonia-24B-v3-AWQ,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.908 | ± | 0.0183 |
strict-match | 5 | exact_match | ↑ | 0.888 | ± | 0.0200 |
vllm (pretrained=/root/autodl-tmp/Cydonia-24B-v3-AWQ,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.892 | ± | 0.0139 |
strict-match | 5 | exact_match | ↑ | 0.862 | ± | 0.0154 |
vllm (pretrained=/root/autodl-tmp/rootCydonia-24B-v3-90-128-3096-9.999,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | ↑ | 0.7942 | ± | 0.0131 | |
- humanities | 2 | none | acc | ↑ | 0.8359 | ± | 0.0252 | |
- other | 2 | none | acc | ↑ | 0.8256 | ± | 0.0266 | |
- social sciences | 2 | none | acc | ↑ | 0.8389 | ± | 0.0266 | |
- stem | 2 | none | acc | ↑ | 0.7158 | ± | 0.0250 |
- Downloads last month
- 7
Model tree for noneUsername/Cydonia-24B-v3-W8A8
Base model
mistralai/Mistral-Small-3.1-24B-Base-2503