InfiX-ai/InfiMed-Foundation-4B

Introduction

InfiMed-4B is a medical Multimodal Large Language Model (MLLM) developed by the InfiXAI team. Our model outperforms HuatuoGPT-V-7B and MedGemma-4B-IT. The goal of InfiMed-4B is to develop a high-performance medical MLLM that ensures accessibility and affordability for a broad audience. Welcome to explore its capabilities and feel free to contact us for any questions or opportunities.

Model Card

Model Architecture:

Architecture	ViT	LLM	Adapter	Resolution
🤗InfiMed-Foundation-4B	🤗siglip-so400m-patch14-384	🤗Qwen3-4B	2-layer MLP	384x384xN

Evaluation

InfiMed-4B not only outperforms HuatuoGPT-V-7B and MedGemma-4B-IT, but is also competitive compared to recently released SoTA models.

Detail Evaluations:

Model	Size	MMMU-Med	VQA-RAD	SLAKE	PathVQA	PMC-VQA	OMVQA	MedXVQA	Avg.
Proprietary Models
GPT-5		83.6	67.8	78.1	52.8	60.0	76.4	71.0	70.0
GPT-5-mini		80.5	66.3	76.1	52.4	57.6	70.9	60.1	66.3
GPT-5-nano		74.1	55.4	69.3	45.4	51.3	66.5	45.1	58.2
GPT-4.1		75.2	65.0	72.2	55.5	55.2	75.5	45.2	63.4
Claude Sonnet 4		74.6	67.6	70.6	54.2	54.4	65.5	43.3	61.5
Gemini-2.5-Flash		76.9	68.5	75.8	55.4	55.4	71.0	52.8	65.1
General Open-source Models
Qwen2.5VL-3B	3B	51.3	56.8	63.2	37.1	50.6	64.5	20.7	49.2
Qwen2.5VL-7B	7B	50.6	64.5	67.2	44.1	51.9	63.6	22.3	52.0
InternVL3-8B	8B	59.2	65.4	72.8	48.6	53.8	79.1	22.4	57.3
Medical Open-source Models
MedGemma-4B-IT	4B	43.7	49.9	76.4	48.8	49.9	69.8	22.3	51.5
LLaVA-Med-7B	7B	29.3	53.7	48.0	38.8	30.5	44.3	20.3	37.8
HuatuoGPT-V-7B	7B	47.3	67.0	67.8	48.0	53.3	74.2	21.6	54.2
Lingshu-7B	7B	54.0	67.9	83.1	61.9	56.3	82.9	26.7	61.8
BioMediX2-8B	8B	39.8	49.2	57.7	37.0	43.5	63.3	21.8	44.6
Infi-Med-1.7B	1.7B	34.7	56.3	75.3	60.7	48.1	58.9	21.8	50.8
Infi-Med-4B	4B	43.3	57.9	77.7	63.4	56.6	76.8	21.9	56.4

Quick Start:

1. Clone the repository

git clone https://huggingface.co/InfiX-ai/InfiMed-Foundation-4B
cd InfiMed-Foundation-4B

2. Run the model

from InfiMed import InfiMed
from PIL import Image
import torch

# Load the model from the pretrained checkpoint
model = InfiMed.from_pretrained("InfiX-ai/InfiMed-Foundation-4B", device_map="auto", torch_dtype=torch.bfloat16)

image_path = "sample.png"  # Replace with the path to your image file
image = Image.open(image_path).convert("RGB")  # Ensure the image is in RGB format

# Prepare input messages
messages = {
    "prompt": "What modality is used to take this image?",
    "image": image  # No image for this example, set to None
}

# Generate output
output_text = model.generate_output(messages)

# Print the result
print("Model Response:", output_text)

Acknowledge

Our model is built upon numerous outstanding open-source projects, and we are grateful for their contributions. We extend special thanks to the google team and Qwen team for their great base models.

License

This project is licensed under Apache License 2.0.

InfiX-ai
/

InfiMed-Foundation-4B