ConceptCLIP / README.md
JerrryNie's picture
Update README.md
7fafd49 verified
metadata
library_name: transformers
tags:
  - medical
  - vision-language
  - clip
  - various modalities
license: mit
language:
  - en

Model Card for ConceptCLIP

Model Details

Model Description

ConceptCLIP is a large-scale vision-language pre-training model enhanced with medical concepts for diverse medical image modalities. It enables robust performance across multiple medical imaging tasks through concept-enhanced language-image alignment.

  • Developed by: Yuxiang Nie, Sunan He, Yequan Bie, Yihui Wang, Zhixuan Chen, Shu Yang, Hao Chen
  • Model type: Vision-Language Pre-trained Model (Medical Specialized)
  • Language(s): English (text), Multi-modal (medical imaging)
  • License: MIT
  • Finetuned from model: Based on OpenCLIP

Model Sources

Uses

Direct Use

  • Zero-shot medical image classification
  • Cross-modal retrieval
  • Zero-shot concept annotation
  • Extract features for whole-slide image analysis
  • Extract features for medical report generation

Downstream Use

  • Fine-tuning for specific medical imaging tasks (CT, MRI, X-ray analysis) for classification, and visual question answering
  • Concept bottleneck model for explanation
  • Integration into clinical decision support systems
  • Medical education and training tools

Out-of-Scope Use

  • Direct clinical diagnosis without clinical validation
  • Non-medical image analysis
  • General purpose vision tasks outside medical domain

Bias, Risks, and Limitations

  • Trained primarily on medical imaging data which may contain demographic biases
  • Performance may vary across different medical imaging modalities
  • Should not be used as sole diagnostic tool without human oversight

Recommendations

  • Validate outputs with clinical experts before medical decision making
  • Fine-tune on domain-specific data for specialized applications
  • Conduct bias analysis when deploying in new clinical environments

How to Get Started with the Model

from transformers import AutoModel, AutoProcessor
import torch
from PIL import Image

model = AutoModel.from_pretrained('JerrryNie/ConceptCLIP', trust_remote_code=True)
processor = AutoProcessor.from_pretrained('JerrryNie/ConceptCLIP', trust_remote_code=True)

image = Image.open('example_data/chest_X-ray.jpg').convert('RGB')
labels = ['chest X-ray', 'brain MRI', 'skin lesion']
texts = [f'a medical image of {label}' for label in labels]

inputs = processor(
    images=image, 
    text=texts,
    return_tensors='pt',
    padding=True,
    truncation=True
).to(model.device)

with torch.no_grad():
    outputs = model(**inputs)
    logits = (outputs['logit_scale'] * outputs['image_features'] @ outputs['text_features'].t()).softmax(dim=-1)[0]

print({label: f"{prob:.2%}" for label, prob in zip(labels, logits)})

Training Details

Training Data

  • Large-scale medical image-text pairs with concept information

Training Procedure

  • Built on OpenCLIP architecture with medical concept integration
  • Pre-training with image-text alignment (IT-Align) and patch-concept alignment (PC-Align) objectives

Training Hyperparameters

  • Base architecture: SigLIP-ViT-400M-16 + PubMedBERT
  • Training regime: Mixed precision training
  • Batch size: 12,288 w/o PC-Align, 6,144 w/ PC-Align
  • Learning rate: 5e-4 w/o PC-Align, 3e-4 w/ PC-Align

Evaluation

Testing Data & Metrics

Testing Data

  • Evaluated on multiple open-sourced medical imaging benchmarks including medical image diagnosis, cross-modal retrieval, medical visual question answering, medical report generation, whole-slide image analysis, and explainable AI

Citation

BibTeX:

@article{nie2025conceptclip,
  title={ConceptCLIP: Towards Trustworthy Medical AI via Concept-Enhanced Contrastive Language-Image Pre-training},
  author={Nie, Yuxiang and He, Sunan and Bie, Yequan and Wang, Yihui and Chen, Zhixuan and Yang, Shu and Chen, Hao},
  journal={arXiv preprint arXiv:2501.15579},
  year={2025}
}

APA:

Nie, Y., He, S., Bie, Y., Wang, Y., Chen, Z., Yang, S., & Chen, H. (2025). ConceptCLIP: Towards trustworthy medical AI via concept-enhanced contrastive language-image pre-training. arXiv preprint arXiv:2501.15579.

Model Card Contact

Yuxiang Nie: [email protected]