vllm (pretrained=/root/autodl-tmp/Seed-Coder-8B-Instruct-abliterated,add_bos_token=true,max_model_len=8096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.552 | ± | 0.0315 |
strict-match | 5 | exact_match | ↑ | 0.552 | ± | 0.0315 |
vllm (pretrained=/root/autodl-tmp/Seed-Coder-8B-Instruct-abliterated,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.566 | ± | 0.0222 |
strict-match | 5 | exact_match | ↑ | 0.564 | ± | 0.0222 |
vllm (pretrained=/root/autodl-tmp/Seed-Coder-8B-Instruct-abliterated,add_bos_token=true,max_model_len=3048,dtype=bfloat16,model_impl=transformers,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | ↑ | 0.4316 | ± | 0.0167 | |
- humanities | 2 | none | acc | ↑ | 0.4205 | ± | 0.0344 | |
- other | 2 | none | acc | ↑ | 0.4615 | ± | 0.0356 | |
- social sciences | 2 | none | acc | ↑ | 0.4278 | ± | 0.0359 | |
- stem | 2 | none | acc | ↑ | 0.4211 | ± | 0.0289 |
vllm (pretrained=/root/autodl-tmp/80-128-4096,add_bos_token=true,max_model_len=8096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.544 | ± | 0.0316 |
strict-match | 5 | exact_match | ↑ | 0.540 | ± | 0.0316 |
vllm (pretrained=/root/autodl-tmp/80-256-4096,add_bos_token=true,max_model_len=8096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.56 | ± | 0.0315 |
strict-match | 5 | exact_match | ↑ | 0.56 | ± | 0.0315 |
vllm (pretrained=/root/autodl-tmp/80-256-4096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.578 | ± | 0.0221 |
strict-match | 5 | exact_match | ↑ | 0.574 | ± | 0.0221 |
vllm (pretrained=/root/autodl-tmp/80-512-8192,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.564 | ± | 0.0314 |
strict-match | 5 | exact_match | ↑ | 0.564 | ± | 0.0314 |
vllm (pretrained=/root/autodl-tmp/80-512-8192,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.570 | ± | 0.0222 |
strict-match | 5 | exact_match | ↑ | 0.566 | ± | 0.0222 |
vllm (pretrained=/root/autodl-tmp/80-512-8192,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | ↑ | 0.4246 | ± | 0.0167 | |
- humanities | 2 | none | acc | ↑ | 0.3897 | ± | 0.0344 | |
- other | 2 | none | acc | ↑ | 0.4667 | ± | 0.0356 | |
- social sciences | 2 | none | acc | ↑ | 0.4222 | ± | 0.0366 | |
- stem | 2 | none | acc | ↑ | 0.4211 | ± | 0.0290 |
vllm (pretrained=/root/autodl-tmp/81-512-8192,add_bos_token=true,max_model_len=8096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.56 | ± | 0.0315 |
strict-match | 5 | exact_match | ↑ | 0.56 | ± | 0.0315 |
vllm (pretrained=/root/autodl-tmp/81-512-8192,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.564 | ± | 0.0222 |
strict-match | 5 | exact_match | ↑ | 0.562 | ± | 0.0222 |
vllm (pretrained=/root/autodl-tmp/82-256-8192,add_bos_token=true,max_model_len=8096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.564 | ± | 0.0314 |
strict-match | 5 | exact_match | ↑ | 0.564 | ± | 0.0314 |
vllm (pretrained=/root/autodl-tmp/82-256-8192,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.586 | ± | 0.0220 |
strict-match | 5 | exact_match | ↑ | 0.580 | ± | 0.0221 |
vllm (pretrained=/root/autodl-tmp/82-256-8192,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | ↑ | 0.4292 | ± | 0.0166 | |
- humanities | 2 | none | acc | ↑ | 0.4051 | ± | 0.0340 | |
- other | 2 | none | acc | ↑ | 0.4718 | ± | 0.0355 | |
- social sciences | 2 | none | acc | ↑ | 0.4278 | ± | 0.0362 | |
- stem | 2 | none | acc | ↑ | 0.4175 | ± | 0.0289 |
vllm (pretrained=/root/autodl-tmp/82-512-8192,add_bos_token=true,max_model_len=8096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.564 | ± | 0.0314 |
strict-match | 5 | exact_match | ↑ | 0.560 | ± | 0.0315 |
vllm (pretrained=/root/autodl-tmp/82-512-8192,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.572 | ± | 0.0221 |
strict-match | 5 | exact_match | ↑ | 0.564 | ± | 0.0222 |
vllm (pretrained=/root/autodl-tmp/82-1024-8192,add_bos_token=true,max_model_len=8096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.552 | ± | 0.0315 |
strict-match | 5 | exact_match | ↑ | 0.552 | ± | 0.0315 |
vllm (pretrained=/root/autodl-tmp/83-512-8192,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.532 | ± | 0.0316 |
strict-match | 5 | exact_match | ↑ | 0.532 | ± | 0.0316 |
vllm (pretrained=/root/autodl-tmp/84-256-8192,add_bos_token=true,max_model_len=8096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.540 | ± | 0.0316 |
strict-match | 5 | exact_match | ↑ | 0.536 | ± | 0.0316 |
- Downloads last month
- 14
Model tree for noneUsername/Seed-Coder-8B-Instruct-abliterated-W8A8
Base model
ByteDance-Seed/Seed-Coder-8B-Base