arxiv:2508.02951

MedBLINK: Probing Basic Perception in Multimodal Language Models for Medicine

Published on Aug 4

· Submitted by

MahtabBg on Aug 7

Upvote

Authors:

Mahtab Bigverdi ,

Abstract

Medblink benchmark evaluates the perceptual abilities of multimodal language models in clinical image interpretation, revealing significant gaps compared to human performance.

AI-generated summary

Multimodal language models (MLMs) show promise for clinical decision support and diagnostic reasoning, raising the prospect of end-to-end automated medical image interpretation. However, clinicians are highly selective in adopting AI tools; a model that makes errors on seemingly simple perception tasks such as determining image orientation or identifying whether a CT scan is contrast-enhance are unlikely to be adopted for clinical tasks. We introduce Medblink, a benchmark designed to probe these models for such perceptual abilities. Medblink spans eight clinically meaningful tasks across multiple imaging modalities and anatomical regions, totaling 1,429 multiple-choice questions over 1,605 images. We evaluate 19 state-of-the-art MLMs, including general purpose (GPT4o, Claude 3.5 Sonnet) and domain specific (Med Flamingo, LLaVA Med, RadFM) models. While human annotators achieve 96.4% accuracy, the best-performing model reaches only 65%. These results show that current MLMs frequently fail at routine perceptual checks, suggesting the need to strengthen their visual grounding to support clinical adoption. Data is available on our project page.

View arXiv page View PDF Add to collection

Community

MahtabBg

Paper author Paper submitter 1 day ago

Would you trust ChatGPT with your X-ray if it couldn't tell if the image is upside down?
We introduce MedBLINK, a benchmark that evaluates MLMs on basic perception tasks that are trivial for clinicians but often fail for AI.

librarian-bot

1 day ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2508.02951 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2508.02951 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.