vllm (pretrained=/root/autodl-tmp/II-Medical-8B,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.800 | ± | 0.0253 |
strict-match | 5 | exact_match | ↑ | 0.876 | ± | 0.0209 |
vllm (pretrained=/root/autodl-tmp/II-Medical-8B,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.796 | ± | 0.018 |
strict-match | 5 | exact_match | ↑ | 0.872 | ± | 0.015 |
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | ↑ | 0.7216 | ± | 0.0144 | |
- humanities | 2 | none | acc | ↑ | 0.7077 | ± | 0.0296 | |
- other | 2 | none | acc | ↑ | 0.7179 | ± | 0.0312 | |
- social sciences | 2 | none | acc | ↑ | 0.8278 | ± | 0.0278 | |
- stem | 2 | none | acc | ↑ | 0.6667 | ± | 0.0263 |
vllm (pretrained=/root/autodl-tmp/II-Medical-8B-70-256-4096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.720 | ± | 0.0285 |
strict-match | 5 | exact_match | ↑ | 0.816 | ± | 0.0246 |
vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-64-4096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.756 | ± | 0.0272 |
strict-match | 5 | exact_match | ↑ | 0.848 | ± | 0.0228 |
vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.764 | ± | 0.0269 |
strict-match | 5 | exact_match | ↑ | 0.852 | ± | 0.0225 |
vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-4096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.764 | ± | 0.0269 |
strict-match | 5 | exact_match | ↑ | 0.864 | ± | 0.0217 |
vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-4096-2,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.772 | ± | 0.0266 |
strict-match | 5 | exact_match | ↑ | 0.856 | ± | 0.0222 |
vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-4096-2.7,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.756 | ± | 0.0272 |
strict-match | 5 | exact_match | ↑ | 0.868 | ± | 0.0215 |
vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-3096-2.9,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.772 | ± | 0.0266 |
strict-match | 5 | exact_match | ↑ | 0.868 | ± | 0.0215 |
vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-3096-3,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.744 | ± | 0.0277 |
strict-match | 5 | exact_match | ↑ | 0.836 | ± | 0.0235 |
vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-4096-3,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.796 | ± | 0.0255 |
strict-match | 5 | exact_match | ↑ | 0.876 | ± | 0.0209 |
vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-4096-3,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.794 | ± | 0.0181 |
strict-match | 5 | exact_match | ↑ | 0.866 | ± | 0.0152 |
vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-4096-3,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | ↑ | 0.7170 | ± | 0.0145 | |
- humanities | 2 | none | acc | ↑ | 0.7128 | ± | 0.0293 | |
- other | 2 | none | acc | ↑ | 0.6923 | ± | 0.0323 | |
- social sciences | 2 | none | acc | ↑ | 0.8111 | ± | 0.0284 | |
- stem | 2 | none | acc | ↑ | 0.6772 | ± | 0.0262 |
vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-3096-3.1,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.796 | ± | 0.0255 |
strict-match | 5 | exact_match | ↑ | 0.884 | ± | 0.0203 |
vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-3096-3.1,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.750 | ± | 0.0194 |
strict-match | 5 | exact_match | ↑ | 0.858 | ± | 0.0156 |
vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-3096-3.1,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | ↑ | 0.7205 | ± | 0.0144 | |
- humanities | 2 | none | acc | ↑ | 0.7077 | ± | 0.0291 | |
- other | 2 | none | acc | ↑ | 0.7128 | ± | 0.0315 | |
- social sciences | 2 | none | acc | ↑ | 0.8167 | ± | 0.0286 | |
- stem | 2 | none | acc | ↑ | 0.6737 | ± | 0.0262 |
vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-128-4096-3.3,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.696 | ± | 0.0292 |
strict-match | 5 | exact_match | ↑ | 0.852 | ± | 0.0225 |
vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-256-4096-3,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.776 | ± | 0.0264 |
strict-match | 5 | exact_match | ↑ | 0.864 | ± | 0.0217 |
vllm (pretrained=/root/autodl-tmp/II-Medical-8B-80-256-4096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.712 | ± | 0.0287 |
strict-match | 5 | exact_match | ↑ | 0.844 | ± | 0.0230 |
vllm (pretrained=/root/autodl-tmp/II-Medical-8B-82-128-3096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.708 | ± | 0.0288 |
strict-match | 5 | exact_match | ↑ | 0.880 | ± | 0.0206 |
- Downloads last month
- 1
Model tree for noneUsername/II-Medical-8B-W8A8
Base model
Intelligent-Internet/II-Medical-8B