MAIRA-2 (finetuned from Vicuna-7B, RAD-DINO)
MAIRA-2 is a multimodal transformer designed for the generation of grounded or non-grounded radiology reports from chest X-rays. MAIRA-2 has been built for research purposes only and is being shared to facilitate comparison and further research.
π Note: For original model weights, refer to microsoft/maira-2.
π Original paper: MAIRA-2: Grounded Radiology Report Generation.
π¬ Experimental Usage in Libra's repo
This model checkpoint is intended for experimental use and can be tested directly within the Libra repository.
For better benchmarking, we recommend using the official test set from X-iZhang/MIMIC-CXR-RRG.
Key Modification
To enable the re-trained vision encoder during inference and to follow the MAIRA-2 behaviour β using feature_maps
from the Dinov2Backbone (i.e., hidden states with LayerNorm applied, instead of raw hidden_states
) β make sure to apply the following configuration:
"unfreeze_mm_vision_tower": true,
"use_maira_feature_norm": true
- This setting is specifically designed for findings section generation from a single frontal view Chest X-ray.
- It is not applicable to grounding tasks or settings involving multiple image inputs.
π Learn More
For a deeper dive into the methodology, theoretical insights, and performance benchmarks of the Libra framework, please see the following resources:
- π Project Website: Libra v1.0
- π Paper: arXiv:2411.19378
- π» Code Repository: X-iZhang/Libra (GitHub)
Disclaimer
This implementation is intended strictly for research and benchmarking purposes. It is not validated for clinical use, and any application in real-world diagnosis or treatment is strongly discouraged.
If any use case is found to violate these intended purposes (e.g., clinical deployment, misleading medical claims), the maintainers reserve the right to remove related code, models, or access permissions without prior notice.
License
MSRLA license.
- Downloads last month
- 51
Model tree for X-iZhang/libra-maira-2
Base model
microsoft/maira-2