TextBraTS
A volume-level text-image public dataset with novel text-guided 3D brain tumor segmentation from BraTS challenge.
Introduction
TextBraTS is an open-access dataset designed to advance research in text-guided 3D brain tumor segmentation. It includes paired multi-modal brain MRI scans and expertly annotated radiology reports, enabling the development and evaluation of multi-modal deep learning models that bridge vision and language in neuro-oncology. Our work has been accepted by MICCAI 2025. The paper is also available on arXiv.
Features
- Multi-modal 3D brain MRI scans with expert-annotated segmentation (T1, T1ce, T2, FLAIR) from BraTS20 challenge training set
- Structured radiology reports for each case
- Text-image alignment method for research on multi-modal fusion
Usage
You can use this dataset for:
- Developing and benchmarking text-guided segmentation models
- Evaluating multi-modal fusion algorithms in medical imaging
- Research in language-driven medical AI
Installing Dependencies
Run the following commands to set up the environment:
conda env create -f environment.yml pip install git+https://github.com/Project-MONAI/MONAI.git@07de215c
If you need to activate the environment, use:
conda activate TextBraTS
Dataset
Due to BraTS official guidelines, MRI images must be downloaded directly from the BraTS 2020 challenge website (training set).
Download our text, feature, and prompt files:
You can download our dataset from Google Drive or Hugging Face.
Our provided text reports, feature files, and prompt files are named to match the original BraTS folder IDs exactly. You can set the path and simply merge them with the downloaded MRI data by merge.py
.
python merge.py
If you would like to change the dataset split, please modify the Train.json
and Test.json
files accordingly.
Inference
We provide our pre-trained weights for direct inference and evaluation.
Download the weights from checkpoint.
After downloading, place the weights in your desired directory, then run the test.py
with following command for inference:
python test.py --pretrained_dir=/path/to/your/weights/ --exp_name=TextBraTS
Training
If you would like to train the model from scratch, you can modify the training code main.py
and please use the following command:
python main.py --distributed --use_ssl_pretrained --save_checkpoint --logdir=TextBraTS
- The
--use_ssl_pretrained
option utilizes the pre-trained weights from NVIDIA's Swin UNETR model. - Download the Swin UNETR pre-trained weights from Pre-trained weights.
- Please place the downloaded weights in the appropriate directory as specified in your configuration or script.
Citation
If you use TextBraTS in your research, please cite:
@inproceedings{shi2025textbrats,
title = {TextBraTS: Text-Guided Volumetric Brain Tumor Segmentation with Innovative Dataset Development and Fusion Module Exploration},
author = {Shi, Xiaoyu and Jain, Rahul Kumar and Li, Yinhao and Hou, Ruibo and Cheng, Jingliang and Bai, Jie and Zhao, Guohua and Lin, Lanfen and Xu, Rui and Chen, Yen-wei},
booktitle = {Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI)},
year = {2025},
note = {to appear}
}