R3D: Revisiting 3D Policy Learning
Paper • 2604.15281 • Published
Project Page | Paper | Code
R3D is a 3D imitation learning policy architecture designed for superior generalization and cross-embodiment transfer in robotics. It addresses common training instabilities and overfitting issues in 3D policy learning by introducing a new architecture coupling a scalable transformer-based 3D encoder with a diffusion decoder.
R3D diagnoses failures in scaling 3D policy learning and proposes a robust foundation for scalable 3D imitation learning through several key innovations:
Below are examples for training the policy on the RoboTwin and ManiSkill benchmarks:
conda activate r3d && bash scripts/train_robotwin2_single.sh r3d_robotwin2 place_shoe 0000 0 0
# To enable DDP multi-GPU training, change GPU ID from single like "0" to multiple like "0,1"
conda activate r3d && bash scripts/train_robotwin2_single.sh r3d_robotwin2 place_shoe 0000 0 0,1
conda activate r3d_maniskill && bash scripts/train_maniskill_single.sh r3d_maniskill PickCube 0000 0 0
# To enable DDP multi-GPU training, change GPU ID from single like "0" to multiple like "0,1"
conda activate r3d_maniskill && bash scripts/train_maniskill_single.sh r3d_maniskill PickCube 0000 0 0,1
@article{hong2024r3d,
title={R3D: Revisiting 3D Policy Learning},
author={Zhengdong Hong and Shenrui Wu and Haozhe Cui and Boyi Zhao and Ran Ji and Yiyang He and Hangxing Zhang and Zundong Ke and Jun Wang and Guofeng Zhang and Jiayuan Gu},
journal={arXiv preprint arXiv:2604.15281},
year={2024}
}