--- license: cc-by-4.0 ---
# SASVi - Segment Any Surgical Video (IPCAI 2025) [![arXiv](https://img.shields.io/badge/arXiv-2502.09653-b31b1b.svg)](https://arxiv.org/abs/2502.09653) [![Paper](https://img.shields.io/badge/Paper-Visit-blue)](https://link.springer.com/article/10.1007/s11548-025-03408-y) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/SsharvienKumar/SASVi)
## Overview SASVi leverages pre-trained frame-wise object detection and segmentation to re-prompt SAM2 for improved surgical video segmentation with scarcely annotated data. ## Example Results * You can find the complete segmentations of the video datasets [here](https://huggingface.co/SsharvienKumar/SASVi/tree/main/dataset). * Checkpoints of the all the overseers can be found [here](https://huggingface.co/SsharvienKumar/SASVi/tree/main/checkpoints). ## Setup * Create a virtual environment of your choice and activate it: `conda create -n sasvi python=3.11 && conda activate sasvi` * Install `torch>=2.3.1` and `torchvision>=0.18.1` following the instructions from [here](https://pytorch.org/get-started/locally/) * Install the dependencies using `pip install -r requirements.txt` * Install SDS_Playground from [here](https://github.com/MECLabTUDA/SDS_Playground) * Install SAM2 using `cd src/sam2 && pip install -e .` * Place SAM2 [checkpoints](https://github.com/facebookresearch/sam2/tree/main#model-description) at `src/sam2/checkpoints` * Convert video files to frame folders using `bash helper_scripts/video_to_frames.sh`. The output should be in the format: ``` ├── │ ├── 0001.jpg │ ├── 0002.jpg │ └── ... ├── │ ├── 0001.jpg │ ├── 0002.jpg │ └── ... └── ... ``` ## Overseer Model Training We provide training scripts for three different overseer models (Mask R-CNN, DETR, Mask2Former) on three different datasets (CaDIS, CholecSeg8k, Cataract1k). You can run the training scripts as follows: `python train_scripts/train__.py` ## SASVi Inference The frames in the video needs to be extracted beforehand and placed in the formatting above. More optional arguments can be found in the script directly. ``` python src/sam2/eval_sasvi.py \ --sam2_cfg configs/sam2.1_hiera_l.yaml \ --sam2_checkpoint ./checkpoints/.pt \ --overseer_checkpoint .pth \ --overseer_type \ --dataset_type \ --base_video_dir \ --output_mask_dir \ --overseer_mask_dir ``` ## nnUNet Training & Inference Fold 0: `nnUNetv2_train DATASET_ID 2d 0 -p nnUNetResEncUNetMPlans -tr nnUNetTrainer_400epochs --npz` Fold 1: `nnUNetv2_train DATASET_ID 2d 1 -p nnUNetResEncUNetMPlans -tr nnUNetTrainer_400epochs --npz` Fold 2: `nnUNetv2_train DATASET_ID 2d 2 -p nnUNetResEncUNetMPlans -tr nnUNetTrainer_400epochs --npz` Fold 3: `nnUNetv2_train DATASET_ID 2d 3 -p nnUNetResEncUNetMPlans -tr nnUNetTrainer_400epochs --npz` Fold 4: `nnUNetv2_train DATASET_ID 2d 4 -p nnUNetResEncUNetMPlans -tr nnUNetTrainer_400epochs --npz` Then find the best configuration using `nnUNetv2_find_best_configuration DATASET_ID -c 2d -p nnUNetResEncUNetMPlans -tr nnUNetTrainer_400epochs` And run inference using `nnUNetv2_predict -d DATASET_ID -i INPUT_FOLDER -o OUTPUT_FOLDER -f 0 1 2 3 4 -tr nnUNetTrainer_400epochs -c 2d -p nnUNetResEncUNetMPlans` Once inference is completed, run postprocessing `nnUNetv2_apply_postprocessing -i OUTPUT_FOLDER -o OUTPUT_FOLDER_PP -pp_pkl_file .../postprocessing.pkl -np 8 -plans_json .../plans.json` ## Evaluation * For frame-wise segmentation evaluation: * `python eval_scripts/eval__frames.py` * For frame-wise segmentation prediction on full videos: * See `python eval_scripts/eval_MaskRCNN_videos.py` for an example. * For video evaluation: 1. E.g. `python eval_scripts/eval_vid_T.py --segm_root --vid_pattern 'train' --mask_pattern '*.npz' --ignore 255 --device cuda` 2. E.g. `python eval_scripts/eval_vid_F.py --segm_root --frames_root --vid_pattern 'train' --frames_pattern '*.jpg' --mask_pattern '*.npz' --raft_iters 12 --device cuda` ## TODOs * [ ] **The code will be refactored soon to be more modular and reusable!** * [ ] Pre-process Cholec80 videos with out-of-body detection * [ ] Improve SASVi by combining it with GT prompting (if available) * [ ] Test SAM2 finetuning ## Citation If you use SASVi in your research, please cite our paper: ``` @article{sivakumar2025sasvi, title={SASVi: segment any surgical video}, author={Sivakumar, Ssharvien Kumar and Frisch, Yannik and Ranem, Amin and Mukhopadhyay, Anirban}, journal={International Journal of Computer Assisted Radiology and Surgery}, pages={1--11}, year={2025}, publisher={Springer} } ```