II-Medical-32B-Preview
I. Model Overview
II-Medical-32B-Preview is the latest advanced large language model developed by Intelligent Internet, specifically designed to enhance AI-driven medical reasoning. As our first 32B-scale model version, it significantly advances the capabilities of medical question answering.
II. Training Methodology
We collected and generated a comprehensive set of reasoning datasets for the medical domain and performed SFT fine-tuning on the Qwen3-32B model.
For the hyperparameter:
- Max Length: 16378.
- Batch Size: 128.
- Learning-Rate: 2e-5.
- Number Of Epoch: 4.
III. Evaluation Results
We evaluated on 10 medical QA benchmarks including MedMCQA, MedQA, PubMedQA, HealthBench, medical related questions from MMLU-Pro, small QA sets from Lancet and the New England Journal of Medicine, 4 Options and 5 Options splits from the MedBullets platform and MedXpertQA.
Model | MedMC | MedQA | PubMed | MMLU-P | HealthBench | Lancet | MedB-4 | MedB-5 | MedX | NEJM | Avg |
---|---|---|---|---|---|---|---|---|---|---|---|
HuatuoGPT-o1-72B | 76.76 | 88.85 | 79.90 | 80.46 | 22.73 | 70.87 | 77.27 | 73.05 | 23.53 | 76.29 | 66.97 |
M1 | 62.54 | 75.81 | 75.80 | 65.86 | 15.51 | 62.62 | 63.64 | 59.74 | 19.59 | 64.34 | 56.55 |
Qwen3-8B | 66.53 | 81.38 | 73.9 | 77.85 | 42.27 | 66.26 | 68.83 | 62.66 | 19.59 | 69.65 | 62.89 |
Qwen3-32B | 74.18 | 88.92 | 76.1 | 80.7 | 47.08 | 72.33 | 72.27 | 71.42 | 28.04 | 76.94 | 68.80 |
MedGemma-27B-IT | 73.24 | 87.27 | 70.9 | 80.13 | 46.54 | 70.14 | 75.32 | 73.37 | 25.55 | 76.28 | 67.87 |
II-Medical-8B | 71.57 | 87.90 | 78.7 | 80.46 | 40.02 | 70.38 | 78.25 | 72.07 | 25.26 | 73.13 | 67.77 |
II-Medical-8B-1706 | 74.44 | 88.61 | 79.8 | 81.04 | 46.8 | 71.60 | 80.84 | 74.67 | 29.63 | 77.61 | 70.47 |
II-Medical-32B-Preview | 75.16 | 90.02 | 79.1 | 80.71 | 47.24 | 75.48 | 81.16 | 74.68 | 31.42 | 80.43 | 71.54 |
IV. Dataset Release
More importantly, besides the II-Medical-32B-Preview, we also release the training datasets of our SFT/Preview II-Medical and also our RL dataset.
We believe this work will be valuable resource for the community and contributes to the advancement of medical reasoning capabilities in AI systems.
V. How To Use
Our model can be utilized in the same manner as Qwen or Deepseek-R1-Distill models.
For instance, you can easily start a service using vLLM:
vllm serve Intelligent-Internet/II-Medical-32B-Preview
You can also easily start a service using SGLang:
python -m sglang.launch_server --model Intelligent-Internet/II-Medical-32B-Preview
VI. Usage Guidelines
- Recommended Sampling Parameters: temperature = 0.6, top_p = 0.9
- When using, explicitly request step-by-step reasoning and format the final answer within \boxed{} (e.g., "Please reason step-by-step, and put your final answer within \boxed{}.").
VII. Limitations and Considerations
- Dataset may contain inherent biases from source materials
- Medical knowledge requires regular updates
- Please note that It’s not suitable for medical use.
VIII. Citation
@misc{2025II-Medical-32B-Preview,
title={II-Medical-32B-Preview: Medical Reasoning Model},
author={Intelligent Internet},
year={2025}
}
- Downloads last month
- 23