SurGrID: Controllable Surgical Simulation via Scene Graph to Image Diffusion (IPCAI 2025)

arXiv Paper Hugging Face Spaces

πŸ’‘Key Features

  • We show that SGs can encode surgical scenes in a human-readable format.
  • We propose a novel pre-training step that encodes global and local information from (image, mask, SG) triplets. The learned embeddings are employed to condition graph to image diffusion for high-quality and precisely controllable surgical simulation.
  • We evaluate our generative approach on scenes from cataract surgeries using quantitative fidelity and diversity measurements, followed by an extensive user study involving clinical experts

πŸ›  Setup

git clone https://github.com/MECLabTUDA/SurGrID.git
cd SurGrID
conda create -n surgrid python=3.8.5 pip=20.3.3
conda activate surgrid

pip install torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

🏁 Model Checkpoints and Dataset

Download the checkpoints of all the necessary models from the provided sources and place them in [results](./results). We also provide the processed CADIS dataset, containing images, segmentation masks and their scene graphs. Update the paths of the dataset in [configs](./configs).

πŸ’₯ Sampling SurGrID

python script/sampler_diffusion.py --conf configs/eval/eval_combined_emb.yaml

⏳ Training SurGrID

Step 1: Train Separate VQGAN for Image and Segmentation

python surgrid/taming/main.py --base configs/vqgan/vqgan_image_cadis.yaml -t --gpus 0,
python surgrid/taming/main.py --base configs/vqgan/vqgan_segmentation_cadis.yaml -t --gpus 0,

Step 2: Train Both Graph Encoder

python script/trainer_graph.py --mode masked --conf configs/graph/graph_cadis.yaml
python script/trainer_graph.py --mode segclip --conf configs/graph/graph_cadis.yaml

Step 3: Train Diffusion Model

python script/trainer_diffusion.py --conf configs/trainer/combined_emb.yaml

πŸ”„ Training SurGrID on a New Dataset

The files below needs to be adapted:

πŸ₯Ό Clinical Expert Assesment

python script/demo_surgrid.py --conf configs/trainer/combined_emb.yaml

Our demo GUI allows for loading ground-truth graphs along with the ground-truth image. The graph’s nodes can be moved, deleted, or have their class changed. We instruct our participants to load four different ground-truth graphs and sequentially perform the following actions on each. They are requested to score the samples’ realism and coherence with the graph input using a Likert scale of 1 to 7:

  • First, participants are instructed to generate a batch of four samples from the groundtruth SG without modifications.
  • Second, the participants are requested to spatially move nodes in the canvas and again judge the synthesised samples.
  • Third, participants change the class of one of the instrument nodes and judge the generated images.
  • Lastly, participants are instructed to remove one of the instruments or miscellaneous classes and judge the synthesised image a final time.
Clinician Synthesisation from GT Spatial Modification Tool Modification Tool Removal
Realism Coherence Realism Coherence Realism Coherence Realism Coherence
P1 6.5Β±0.5 6.5Β±1.0 6.3Β±0.9 6.3Β±0.9 5.3Β±1.2 4.5Β±1.9 6.3Β±0.9 5.5Β±2.3
P2 5.3Β±0.9 5.3Β±0.5 4.5Β±0.5 4.3Β±2.0 5.3Β±0.9 5.8Β±0.9 5.5Β±1.2 5.5Β±1.9
P3 6.3Β±0.9 6.3Β±0.9 6.5Β±1.0 5.5Β±0.5 6.0Β±0.8 6.8Β±0.5 6.3Β±0.5 6.5Β±0.5

πŸ“œ Citations

If you are using SurGrID for your paper, please cite the following paper:

@article{frisch2025surgrid,
  title={SurGrID: Controllable Surgical Simulation via Scene Graph to Image Diffusion},
  author={Frisch, Yannik and Sivakumar, Ssharvien Kumar and K{\"o}ksal, {\c{C}}a{\u{g}}han and B{\"o}hm, Elsa and Wagner, Felix and Gericke, Adrian and Ghazaei, Ghazal and Mukhopadhyay, Anirban},
  journal={arXiv preprint arXiv:2502.07945},
  year={2025}
}

⭐ Acknowledgement

Thanks for the following projects and theoretical works that we have either used or inspired from:

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support