MAIRA-2 (finetuned from Vicuna-7B, RAD-DINO)

MAIRA-2 is a multimodal transformer designed for the generation of grounded or non-grounded radiology reports from chest X-rays. MAIRA-2 has been built for research purposes only and is being shared to facilitate comparison and further research.

📌 Note: For original model weights, refer to microsoft/maira-2.

📃 Original paper: MAIRA-2: Grounded Radiology Report Generation.

🔬 Experimental Usage in Libra's repo

This model checkpoint is intended for experimental use and can be tested directly within the Libra repository.

For better benchmarking, we recommend using the official test set from X-iZhang/MIMIC-CXR-RRG.

Key Modification

To enable the re-trained vision encoder during inference and to follow the MAIRA-2 behaviour — using feature_maps from the Dinov2Backbone (i.e., hidden states with LayerNorm applied, instead of raw hidden_states) — make sure to apply the following configuration:

"unfreeze_mm_vision_tower": true,
"use_maira_feature_norm": true

This setting is specifically designed for findings section generation from a single frontal view Chest X-ray.

It is not applicable to grounding tasks or settings involving multiple image inputs.

Use-case: Findings generation without grounding

❗️MAIRA-2 requires a strict Chat Template and must be manually provided.

# ✅ With clinical instruction
prompt_with_clinical = (
    "Provide a description of the findings in the radiology study in comparison to the prior frontal image. "
    "INDICATION: Dyspnea. TECHNIQUE: PA and lateral views of the chest. COMPARISON: None."
)

# ✅ Without clinical instruction — placeholders (INDICATION, TECHNIQUE, COMPARISON) must still be included
prompt_minimal = (
    "Provide a description of the findings in the radiology study in comparison to the prior frontal image. "
    "INDICATION: None. TECHNIQUE: None. COMPARISON: None."
)

# 🧪 Inference example following the official MAIRA-2 setup
from libra.eval import libra_eval

frontal_image_url = "https://openi.nlm.nih.gov/imgs/512/145/145/CXR145_IM-0290-1001.png"
model_path = "X-iZhang/libra-maira-2"

answer = libra_eval(
    model_path=model_path,
    image_file=[frontal_image_url],
    query=prompt_with_clinical,
    conv_mode="maira_2",
    temperature=0.0,         # Use greedy decoding
    max_new_tokens=300,
)

# ✅ Expected output
print(answer)
# > There is a large right pleural effusion.
# > No pneumothorax is identified.
# > There is no left pleural effusion.
# > There is no focal consolidation.
# > The cardiomediastinal silhouette is within normal limits.

📚 Learn More

For a deeper dive into the methodology, theoretical insights, and performance benchmarks of the Libra framework, please see the following resources:

🔗 Project Website: Libra v1.0
📄 Paper: arXiv:2411.19378
💻 Code Repository: X-iZhang/Libra (GitHub)
📷 Related Project: CCD – Clinical Change Detection; see technical details in the paper here.

Disclaimer

This implementation is intended strictly for research and benchmarking purposes. It is not validated for clinical use, and any application in real-world diagnosis or treatment is strongly discouraged.

If any use case is found to violate these intended purposes (e.g., clinical deployment, misleading medical claims), the maintainers reserve the right to remove related code, models, or access permissions without prior notice.