vllm (pretrained=/root/autodl-tmp/II-Medical-8B,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.800 ± 0.0253
strict-match 5 exact_match ↑ 0.876 ± 0.0209

vllm (pretrained=/root/autodl-tmp/II-Medical-8B,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.796 ± 0.018
strict-match 5 exact_match ↑ 0.872 ± 0.015
Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc ↑ 0.7216 ± 0.0144
- humanities 2 none acc ↑ 0.7077 ± 0.0296
- other 2 none acc ↑ 0.7179 ± 0.0312
- social sciences 2 none acc ↑ 0.8278 ± 0.0278
- stem 2 none acc ↑ 0.6667 ± 0.0263

vllm (pretrained=/root/autodl-tmp/II-Medical-8B-70-256-4096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.720 ± 0.0285
strict-match 5 exact_match ↑ 0.816 ± 0.0246

vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-64-4096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.756 ± 0.0272
strict-match 5 exact_match ↑ 0.848 ± 0.0228

vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.764 ± 0.0269
strict-match 5 exact_match ↑ 0.852 ± 0.0225

vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-4096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.764 ± 0.0269
strict-match 5 exact_match ↑ 0.864 ± 0.0217

vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-4096-2,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.772 ± 0.0266
strict-match 5 exact_match ↑ 0.856 ± 0.0222

vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-4096-2.7,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.756 ± 0.0272
strict-match 5 exact_match ↑ 0.868 ± 0.0215

vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-3096-2.9,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.772 ± 0.0266
strict-match 5 exact_match ↑ 0.868 ± 0.0215

vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-3096-3,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.744 ± 0.0277
strict-match 5 exact_match ↑ 0.836 ± 0.0235

vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-4096-3,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.796 ± 0.0255
strict-match 5 exact_match ↑ 0.876 ± 0.0209

vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-4096-3,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.794 ± 0.0181
strict-match 5 exact_match ↑ 0.866 ± 0.0152

vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-4096-3,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc ↑ 0.7170 ± 0.0145
- humanities 2 none acc ↑ 0.7128 ± 0.0293
- other 2 none acc ↑ 0.6923 ± 0.0323
- social sciences 2 none acc ↑ 0.8111 ± 0.0284
- stem 2 none acc ↑ 0.6772 ± 0.0262

vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-3096-3.1,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.796 ± 0.0255
strict-match 5 exact_match ↑ 0.884 ± 0.0203

vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-3096-3.1,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.750 ± 0.0194
strict-match 5 exact_match ↑ 0.858 ± 0.0156

vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-3096-3.1,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc ↑ 0.7205 ± 0.0144
- humanities 2 none acc ↑ 0.7077 ± 0.0291
- other 2 none acc ↑ 0.7128 ± 0.0315
- social sciences 2 none acc ↑ 0.8167 ± 0.0286
- stem 2 none acc ↑ 0.6737 ± 0.0262

vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-4096-3.3,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.696 ± 0.0292
strict-match 5 exact_match ↑ 0.852 ± 0.0225

vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-256-4096-3,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.776 ± 0.0264
strict-match 5 exact_match ↑ 0.864 ± 0.0217

vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-256-4096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.712 ± 0.0287
strict-match 5 exact_match ↑ 0.844 ± 0.0230

vllm (pretrained=/root/autodl-tmp/II-Medical-8B-82-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match ↑ 0.708 ± 0.0288
strict-match 5 exact_match ↑ 0.880 ± 0.0206
Downloads last month
1
Safetensors
Model size
8.19B params
Tensor type
BF16
·
I8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for noneUsername/II-Medical-8B-W8A8

Quantized
(14)
this model