vllm (pretrained=/root/autodl-tmp/Seed-Coder-8B-Instruct-abliterated,add_bos_token=true,max_model_len=8096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.552 ± 0.0315
strict-match 5 exact_match 0.552 ± 0.0315

vllm (pretrained=/root/autodl-tmp/Seed-Coder-8B-Instruct-abliterated,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.566 ± 0.0222
strict-match 5 exact_match 0.564 ± 0.0222

vllm (pretrained=/root/autodl-tmp/Seed-Coder-8B-Instruct-abliterated,add_bos_token=true,max_model_len=3048,dtype=bfloat16,model_impl=transformers,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.4316 ± 0.0167
- humanities 2 none acc 0.4205 ± 0.0344
- other 2 none acc 0.4615 ± 0.0356
- social sciences 2 none acc 0.4278 ± 0.0359
- stem 2 none acc 0.4211 ± 0.0289

vllm (pretrained=/root/autodl-tmp/80-128-4096,add_bos_token=true,max_model_len=8096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.544 ± 0.0316
strict-match 5 exact_match 0.540 ± 0.0316

vllm (pretrained=/root/autodl-tmp/80-256-4096,add_bos_token=true,max_model_len=8096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.56 ± 0.0315
strict-match 5 exact_match 0.56 ± 0.0315

vllm (pretrained=/root/autodl-tmp/80-256-4096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.578 ± 0.0221
strict-match 5 exact_match 0.574 ± 0.0221

vllm (pretrained=/root/autodl-tmp/80-512-8192,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.564 ± 0.0314
strict-match 5 exact_match 0.564 ± 0.0314

vllm (pretrained=/root/autodl-tmp/80-512-8192,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.570 ± 0.0222
strict-match 5 exact_match 0.566 ± 0.0222

vllm (pretrained=/root/autodl-tmp/80-512-8192,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.4246 ± 0.0167
- humanities 2 none acc 0.3897 ± 0.0344
- other 2 none acc 0.4667 ± 0.0356
- social sciences 2 none acc 0.4222 ± 0.0366
- stem 2 none acc 0.4211 ± 0.0290

vllm (pretrained=/root/autodl-tmp/81-512-8192,add_bos_token=true,max_model_len=8096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.56 ± 0.0315
strict-match 5 exact_match 0.56 ± 0.0315

vllm (pretrained=/root/autodl-tmp/81-512-8192,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.564 ± 0.0222
strict-match 5 exact_match 0.562 ± 0.0222

vllm (pretrained=/root/autodl-tmp/82-256-8192,add_bos_token=true,max_model_len=8096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.564 ± 0.0314
strict-match 5 exact_match 0.564 ± 0.0314

vllm (pretrained=/root/autodl-tmp/82-256-8192,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.586 ± 0.0220
strict-match 5 exact_match 0.580 ± 0.0221

vllm (pretrained=/root/autodl-tmp/82-256-8192,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.4292 ± 0.0166
- humanities 2 none acc 0.4051 ± 0.0340
- other 2 none acc 0.4718 ± 0.0355
- social sciences 2 none acc 0.4278 ± 0.0362
- stem 2 none acc 0.4175 ± 0.0289

vllm (pretrained=/root/autodl-tmp/82-512-8192,add_bos_token=true,max_model_len=8096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.564 ± 0.0314
strict-match 5 exact_match 0.560 ± 0.0315

vllm (pretrained=/root/autodl-tmp/82-512-8192,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.572 ± 0.0221
strict-match 5 exact_match 0.564 ± 0.0222

vllm (pretrained=/root/autodl-tmp/82-1024-8192,add_bos_token=true,max_model_len=8096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.552 ± 0.0315
strict-match 5 exact_match 0.552 ± 0.0315

vllm (pretrained=/root/autodl-tmp/83-512-8192,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.532 ± 0.0316
strict-match 5 exact_match 0.532 ± 0.0316

vllm (pretrained=/root/autodl-tmp/84-256-8192,add_bos_token=true,max_model_len=8096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.540 ± 0.0316
strict-match 5 exact_match 0.536 ± 0.0316
Downloads last month
14
Safetensors
Model size
8.25B params
Tensor type
BF16
·
I8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for noneUsername/Seed-Coder-8B-Instruct-abliterated-W8A8