vllm (pretrained=/root/autodl-tmp/Devstral-Small-2505,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.864 | ± | 0.0217 |
strict-match | 5 | exact_match | ↑ | 0.860 | ± | 0.0220 |
vllm (pretrained=/root/autodl-tmp/Devstral-Small-2505,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.868 | ± | 0.0152 |
strict-match | 5 | exact_match | ↑ | 0.864 | ± | 0.0153 |
vllm (pretrained=/root/autodl-tmp/Devstral-Small-2505,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | ↑ | 0.7965 | ± | 0.0129 | |
- humanities | 2 | none | acc | ↑ | 0.8205 | ± | 0.0244 | |
- other | 2 | none | acc | ↑ | 0.8308 | ± | 0.0259 | |
- social sciences | 2 | none | acc | ↑ | 0.8444 | ± | 0.0261 | |
- stem | 2 | none | acc | ↑ | 0.7263 | ± | 0.0252 |
vllm (pretrained=/root/autodl-tmp/80-128-4096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.840 | ± | 0.0232 |
strict-match | 5 | exact_match | ↑ | 0.832 | ± | 0.0237 |
vllm (pretrained=/root/autodl-tmp/86-128-4096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.840 | ± | 0.0232 |
strict-match | 5 | exact_match | ↑ | 0.828 | ± | 0.0239 |
vllm (pretrained=/root/autodl-tmp/86-128-4096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.846 | ± | 0.0162 |
strict-match | 5 | exact_match | ↑ | 0.836 | ± | 0.0166 |
vllm (pretrained=/root/autodl-tmp/86-128-4096,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | ↑ | 0.7532 | ± | 0.0140 | |
- humanities | 2 | none | acc | ↑ | 0.7744 | ± | 0.0272 | |
- other | 2 | none | acc | ↑ | 0.7692 | ± | 0.0291 | |
- social sciences | 2 | none | acc | ↑ | 0.8278 | ± | 0.0277 | |
- stem | 2 | none | acc | ↑ | 0.6807 | ± | 0.0268 |
vllm (pretrained=/root/autodl-tmp/root-W8A8-86-128-3096-2,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.812 | ± | 0.0248 |
strict-match | 5 | exact_match | ↑ | 0.800 | ± | 0.0253 |
vllm (pretrained=/root/autodl-tmp/86-256-4096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.848 | ± | 0.0228 |
strict-match | 5 | exact_match | ↑ | 0.836 | ± | 0.0235 |
vllm (pretrained=/root/autodl-tmp/86-256-4096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.844 | ± | 0.0162 |
strict-match | 5 | exact_match | ↑ | 0.830 | ± | 0.0168 |
vllm (pretrained=/root/autodl-tmp/86-256-4096,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | ↑ | 0.7614 | ± | 0.0137 | |
- humanities | 2 | none | acc | ↑ | 0.7590 | ± | 0.0277 | |
- other | 2 | none | acc | ↑ | 0.7949 | ± | 0.0270 | |
- social sciences | 2 | none | acc | ↑ | 0.8389 | ± | 0.0273 | |
- stem | 2 | none | acc | ↑ | 0.6912 | ± | 0.0265 |
vllm (pretrained=/root/autodl-tmp/86-512-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.820 | ± | 0.0243 |
strict-match | 5 | exact_match | ↑ | 0.808 | ± | 0.0250 |
vllm (pretrained=/root/autodl-tmp/865-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.840 | ± | 0.0232 |
strict-match | 5 | exact_match | ↑ | 0.828 | ± | 0.0239 |
vllm (pretrained=/root/autodl-tmp/87-64-4096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.824 | ± | 0.0241 |
strict-match | 5 | exact_match | ↑ | 0.808 | ± | 0.0250 |
vllm (pretrained=/root/autodl-tmp/87-64-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.844 | ± | 0.0230 |
strict-match | 5 | exact_match | ↑ | 0.836 | ± | 0.0235 |
vllm (pretrained=/root/autodl-tmp/87-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.860 | ± | 0.0220 |
strict-match | 5 | exact_match | ↑ | 0.856 | ± | 0.0222 |
vllm (pretrained=/root/autodl-tmp/87-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.85 | ± | 0.0160 |
strict-match | 5 | exact_match | ↑ | 0.84 | ± | 0.0164 |
vllm (pretrained=/root/autodl-tmp/87-128-3096,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | ↑ | 0.7509 | ± | 0.0139 | |
- humanities | 2 | none | acc | ↑ | 0.7949 | ± | 0.0261 | |
- other | 2 | none | acc | ↑ | 0.7641 | ± | 0.0287 | |
- social sciences | 2 | none | acc | ↑ | 0.8167 | ± | 0.0285 | |
- stem | 2 | none | acc | ↑ | 0.6702 | ± | 0.0268 |
vllm (pretrained=/root/autodl-tmp/87-128-3096-3,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.844 | ± | 0.0230 |
strict-match | 5 | exact_match | ↑ | 0.832 | ± | 0.0237 |
vllm (pretrained=/root/autodl-tmp/87-128-3096-4,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.804 | ± | 0.0252 |
strict-match | 5 | exact_match | ↑ | 0.804 | ± | 0.0252 |
vllm (pretrained=/root/autodl-tmp/87-128-4096-2,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.824 | ± | 0.0241 |
strict-match | 5 | exact_match | ↑ | 0.808 | ± | 0.0250 |
vllm (pretrained=/root/autodl-tmp/87-256-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.828 | ± | 0.0239 |
strict-match | 5 | exact_match | ↑ | 0.816 | ± | 0.0246 |
vllm (pretrained=/root/autodl-tmp/87-256-4096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.828 | ± | 0.0239 |
strict-match | 5 | exact_match | ↑ | 0.824 | ± | 0.0241 |
vllm (pretrained=/root/autodl-tmp/88-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.848 | ± | 0.0228 |
strict-match | 5 | exact_match | ↑ | 0.844 | ± | 0.0230 |
vllm (pretrained=/root/autodl-tmp/885-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.836 | ± | 0.0235 |
strict-match | 5 | exact_match | ↑ | 0.820 | ± | 0.0243 |
vllm (pretrained=/root/autodl-tmp/89-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.828 | ± | 0.0239 |
strict-match | 5 | exact_match | ↑ | 0.824 | ± | 0.0241 |
- Downloads last month
- 36
Model tree for noneUsername/Devstral-Small-2505-W8A8-Defective
Base model
mistralai/Devstral-Small-2505