SRT-H: A Hierarchical Framework for Autonomous Surgery via Language Conditioned Imitation Learning
Abstract
A hierarchical framework combining high-level task planning and low-level trajectory generation enables autonomous surgical procedures with high success rates in ex vivo experiments.
Research on autonomous surgery has largely focused on simple task automation in controlled environments. However, real-world surgical applications demand dexterous manipulation over extended durations and generalization to the inherent variability of human tissue. These challenges remain difficult to address using existing logic-based or conventional end-to-end learning approaches. To address this gap, we propose a hierarchical framework for performing dexterous, long-horizon surgical steps. Our approach utilizes a high-level policy for task planning and a low-level policy for generating robot trajectories. The high-level planner plans in language space, generating task-level or corrective instructions that guide the robot through the long-horizon steps and correct for the low-level policy's errors. We validate our framework through ex vivo experiments on cholecystectomy, a commonly-practiced minimally invasive procedure, and conduct ablation studies to evaluate key components of the system. Our method achieves a 100\% success rate across eight unseen ex vivo gallbladders, operating fully autonomously without human intervention. This work demonstrates step-level autonomy in a surgical procedure, marking a milestone toward clinical deployment of autonomous surgical systems.
Community
Thanks for sharing!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- mimic-one: a Scalable Model Recipe for General Purpose Robot Dexterity (2025)
- Genie Centurion: Accelerating Scalable Real-World Robot Training with Human Rewind-and-Refine Guidance (2025)
- SwitchVLA: Execution-Aware Task Switching for Vision-Language-Action Models (2025)
- EnerVerse-AC: Envisioning Embodied Environments with Action Condition (2025)
- Bootstrapping Imitation Learning for Long-horizon Manipulation via Hierarchical Data Collection Space (2025)
- Human2LocoMan: Learning Versatile Quadrupedal Manipulation with Human Pretraining (2025)
- HoMeR: Learning In-the-Wild Mobile Manipulation via Hybrid Imitation and Whole-Body Control (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper