Introduction

InfiMed-4B is a medical Multimodal Large Language Model (MLLM) developed by the InfiXAI team. Our model outperforms HuatuoGPT-V-7B and MedGemma-4B-IT. The goal of InfiMed-4B is to develop a high-performance medical MLLM that ensures accessibility and affordability for a broad audience. Welcome to explore its capabilities and feel free to contact us for any questions or opportunities.

Model Card

Model Architecture:

Architecture ViT LLM Adapter Resolution
🤗InfiMed-Foundation-4B 🤗siglip-so400m-patch14-384 🤗Qwen3-4B 2-layer MLP 384x384xN

Evaluation

InfiMed-4B not only outperforms HuatuoGPT-V-7B and MedGemma-4B-IT, but is also competitive compared to recently released SoTA models.

Detail Evaluations:

Model Size MMMU-Med VQA-RAD SLAKE PathVQA PMC-VQA OMVQA MedXVQA Avg.
Proprietary Models
GPT-5 83.6 67.8 78.1 52.8 60.0 76.4 71.0 70.0
GPT-5-mini 80.5 66.3 76.1 52.4 57.6 70.9 60.1 66.3
GPT-5-nano 74.1 55.4 69.3 45.4 51.3 66.5 45.1 58.2
GPT-4.1 75.2 65.0 72.2 55.5 55.2 75.5 45.2 63.4
Claude Sonnet 4 74.6 67.6 70.6 54.2 54.4 65.5 43.3 61.5
Gemini-2.5-Flash 76.9 68.5 75.8 55.4 55.4 71.0 52.8 65.1
General Open-source Models
Qwen2.5VL-3B 3B 51.3 56.8 63.2 37.1 50.6 64.5 20.7 49.2
Qwen2.5VL-7B 7B 50.6 64.5 67.2 44.1 51.9 63.6 22.3 52.0
InternVL3-8B 8B 59.2 65.4 72.8 48.6 53.8 79.1 22.4 57.3
Medical Open-source Models
MedGemma-4B-IT 4B 43.7 49.9 76.4 48.8 49.9 69.8 22.3 51.5
LLaVA-Med-7B 7B 29.3 53.7 48.0 38.8 30.5 44.3 20.3 37.8
HuatuoGPT-V-7B 7B 47.3 67.0 67.8 48.0 53.3 74.2 21.6 54.2
Lingshu-7B 7B 54.0 67.9 83.1 61.9 56.3 82.9 26.7 61.8
BioMediX2-8B 8B 39.8 49.2 57.7 37.0 43.5 63.3 21.8 44.6
Infi-Med-1.7B 1.7B 34.7 56.3 75.3 60.7 48.1 58.9 21.8 50.8
Infi-Med-4B 4B 43.3 57.9 77.7 63.4 56.6 76.8 21.9 56.4

Quick Start:

1. Clone the repository

git clone https://huggingface.co/InfiX-ai/InfiMed-Foundation-4B
cd InfiMed-Foundation-4B

2. Run the model

from InfiMed import InfiMed
from PIL import Image
import torch

# Load the model from the pretrained checkpoint
model = InfiMed.from_pretrained("InfiX-ai/InfiMed-Foundation-4B", device_map="auto", torch_dtype=torch.bfloat16)

image_path = "sample.png"  # Replace with the path to your image file
image = Image.open(image_path).convert("RGB")  # Ensure the image is in RGB format

# Prepare input messages
messages = {
    "prompt": "What modality is used to take this image?",
    "image": image  # No image for this example, set to None
}

# Generate output
output_text = model.generate_output(messages)

# Print the result
print("Model Response:", output_text)

Acknowledge

Our model is built upon numerous outstanding open-source projects, and we are grateful for their contributions. We extend special thanks to the google team and Qwen team for their great base models.

License

This project is licensed under Apache License 2.0.

Downloads last month
44
Safetensors
Model size
4.85B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for InfiX-ai/InfiMed-Foundation-4B

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Finetuned
(278)
this model

Collection including InfiX-ai/InfiMed-Foundation-4B