Introduction
InfiMed-4B is a medical Multimodal Large Language Model (MLLM) developed by the InfiXAI team. Our model outperforms HuatuoGPT-V-7B and MedGemma-4B-IT. The goal of InfiMed-4B is to develop a high-performance medical MLLM that ensures accessibility and affordability for a broad audience. Welcome to explore its capabilities and feel free to contact us for any questions or opportunities.
Model Card
Model Architecture:
Architecture | ViT | LLM | Adapter | Resolution |
---|---|---|---|---|
🤗InfiMed-Foundation-4B | 🤗siglip-so400m-patch14-384 | 🤗Qwen3-4B | 2-layer MLP | 384x384xN |
Evaluation
InfiMed-4B not only outperforms HuatuoGPT-V-7B and MedGemma-4B-IT, but is also competitive compared to recently released SoTA models.
Detail Evaluations:
Model | Size | MMMU-Med | VQA-RAD | SLAKE | PathVQA | PMC-VQA | OMVQA | MedXVQA | Avg. |
---|---|---|---|---|---|---|---|---|---|
Proprietary Models | |||||||||
GPT-5 | 83.6 | 67.8 | 78.1 | 52.8 | 60.0 | 76.4 | 71.0 | 70.0 | |
GPT-5-mini | 80.5 | 66.3 | 76.1 | 52.4 | 57.6 | 70.9 | 60.1 | 66.3 | |
GPT-5-nano | 74.1 | 55.4 | 69.3 | 45.4 | 51.3 | 66.5 | 45.1 | 58.2 | |
GPT-4.1 | 75.2 | 65.0 | 72.2 | 55.5 | 55.2 | 75.5 | 45.2 | 63.4 | |
Claude Sonnet 4 | 74.6 | 67.6 | 70.6 | 54.2 | 54.4 | 65.5 | 43.3 | 61.5 | |
Gemini-2.5-Flash | 76.9 | 68.5 | 75.8 | 55.4 | 55.4 | 71.0 | 52.8 | 65.1 | |
General Open-source Models | |||||||||
Qwen2.5VL-3B | 3B | 51.3 | 56.8 | 63.2 | 37.1 | 50.6 | 64.5 | 20.7 | 49.2 |
Qwen2.5VL-7B | 7B | 50.6 | 64.5 | 67.2 | 44.1 | 51.9 | 63.6 | 22.3 | 52.0 |
InternVL3-8B | 8B | 59.2 | 65.4 | 72.8 | 48.6 | 53.8 | 79.1 | 22.4 | 57.3 |
Medical Open-source Models | |||||||||
MedGemma-4B-IT | 4B | 43.7 | 49.9 | 76.4 | 48.8 | 49.9 | 69.8 | 22.3 | 51.5 |
LLaVA-Med-7B | 7B | 29.3 | 53.7 | 48.0 | 38.8 | 30.5 | 44.3 | 20.3 | 37.8 |
HuatuoGPT-V-7B | 7B | 47.3 | 67.0 | 67.8 | 48.0 | 53.3 | 74.2 | 21.6 | 54.2 |
Lingshu-7B | 7B | 54.0 | 67.9 | 83.1 | 61.9 | 56.3 | 82.9 | 26.7 | 61.8 |
BioMediX2-8B | 8B | 39.8 | 49.2 | 57.7 | 37.0 | 43.5 | 63.3 | 21.8 | 44.6 |
Infi-Med-1.7B | 1.7B | 34.7 | 56.3 | 75.3 | 60.7 | 48.1 | 58.9 | 21.8 | 50.8 |
Infi-Med-4B | 4B | 43.3 | 57.9 | 77.7 | 63.4 | 56.6 | 76.8 | 21.9 | 56.4 |
Quick Start:
1. Clone the repository
git clone https://huggingface.co/InfiX-ai/InfiMed-Foundation-4B
cd InfiMed-Foundation-4B
2. Run the model
from InfiMed import InfiMed
from PIL import Image
import torch
# Load the model from the pretrained checkpoint
model = InfiMed.from_pretrained("InfiX-ai/InfiMed-Foundation-4B", device_map="auto", torch_dtype=torch.bfloat16)
image_path = "sample.png" # Replace with the path to your image file
image = Image.open(image_path).convert("RGB") # Ensure the image is in RGB format
# Prepare input messages
messages = {
"prompt": "What modality is used to take this image?",
"image": image # No image for this example, set to None
}
# Generate output
output_text = model.generate_output(messages)
# Print the result
print("Model Response:", output_text)
Acknowledge
Our model is built upon numerous outstanding open-source projects, and we are grateful for their contributions. We extend special thanks to the google team and Qwen team for their great base models.
License
This project is licensed under Apache License 2.0.
- Downloads last month
- 44