YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

II-Medical-32B-Preview

image/png

I. Model Overview

II-Medical-32B-Preview is the latest advanced large language model developed by Intelligent Internet, specifically designed to enhance AI-driven medical reasoning. As our first 32B-scale model version, it significantly advances the capabilities of medical question answering.

II. Training Methodology

We collected and generated a comprehensive set of reasoning datasets for the medical domain and performed SFT fine-tuning on the Qwen3-32B model.

For the hyperparameter:

  • Max Length: 16378.
  • Batch Size: 128.
  • Learning-Rate: 2e-5.
  • Number Of Epoch: 4.

III. Evaluation Results

image/png

image/png

We evaluated on 10 medical QA benchmarks including MedMCQA, MedQA, PubMedQA, HealthBench, medical related questions from MMLU-Pro, small QA sets from Lancet and the New England Journal of Medicine, 4 Options and 5 Options splits from the MedBullets platform and MedXpertQA.

Model MedMC MedQA PubMed MMLU-P HealthBench Lancet MedB-4 MedB-5 MedX NEJM Avg
HuatuoGPT-o1-72B 76.76 88.85 79.90 80.46 22.73 70.87 77.27 73.05 23.53 76.29 66.97
M1 62.54 75.81 75.80 65.86 15.51 62.62 63.64 59.74 19.59 64.34 56.55
Qwen3-8B 66.53 81.38 73.9 77.85 42.27 66.26 68.83 62.66 19.59 69.65 62.89
Qwen3-32B 74.18 88.92 76.1 80.7 47.08 72.33 72.27 71.42 28.04 76.94 68.80
MedGemma-27B-IT 73.24 87.27 70.9 80.13 46.54 70.14 75.32 73.37 25.55 76.28 67.87
II-Medical-8B 71.57 87.90 78.7 80.46 40.02 70.38 78.25 72.07 25.26 73.13 67.77
II-Medical-8B-1706 74.44 88.61 79.8 81.04 46.8 71.60 80.84 74.67 29.63 77.61 70.47
II-Medical-32B-Preview 75.16 90.02 79.1 80.71 47.24 75.48 81.16 74.68 31.42 80.43 71.54

IV. Dataset Release

More importantly, besides the II-Medical-32B-Preview, we also release the training datasets of our SFT/Preview II-Medical and also our RL dataset.

We believe this work will be valuable resource for the community and contributes to the advancement of medical reasoning capabilities in AI systems.

V. How To Use

Our model can be utilized in the same manner as Qwen or Deepseek-R1-Distill models.

For instance, you can easily start a service using vLLM:

vllm serve Intelligent-Internet/II-Medical-32B-Preview

You can also easily start a service using SGLang:

python -m sglang.launch_server --model Intelligent-Internet/II-Medical-32B-Preview

VI. Usage Guidelines

  • Recommended Sampling Parameters: temperature = 0.6, top_p = 0.9
  • When using, explicitly request step-by-step reasoning and format the final answer within \boxed{} (e.g., "Please reason step-by-step, and put your final answer within \boxed{}.").

VII. Limitations and Considerations

  • Dataset may contain inherent biases from source materials
  • Medical knowledge requires regular updates
  • Please note that It’s not suitable for medical use.

VIII. Citation

@misc{2025II-Medical-32B-Preview,
      title={II-Medical-32B-Preview: Medical Reasoning Model}, 
      author={Intelligent Internet},
      year={2025}
}
Downloads last month
23
Safetensors
Model size
32.8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Intelligent-Internet/II-Medical-32B-Preview