R3D: Revisiting 3D Policy Learning

Project Page | Paper | Code

R3D is a 3D imitation learning policy architecture designed for superior generalization and cross-embodiment transfer in robotics. It addresses common training instabilities and overfitting issues in 3D policy learning by introducing a new architecture coupling a scalable transformer-based 3D encoder with a diffusion decoder.

Overview

R3D diagnoses failures in scaling 3D policy learning and proposes a robust foundation for scalable 3D imitation learning through several key innovations:

  • Robust Scaling with Layer Normalization: Overcomes the "scaling paradox" by replacing Batch Normalization with Layer Normalization, enabling the stable training of high-capacity 3D encoders (e.g., Uni3D).
  • Comprehensive 3D Data Augmentation: Prevents overfitting through a robust pipeline including FPS randomization, color jitter, additive noise, and random point dropout.
  • Spatially-Aware Decoding: Employs cross-attention between action queries and dense geometric tokens to preserve high-resolution spatial information.
  • Large-Scale 3D Pre-training: Leverages rich geometric priors by utilizing encoders pre-trained on diverse 3D datasets, which accelerates convergence and enhances feature representation.

Training Examples

Below are examples for training the policy on the RoboTwin and ManiSkill benchmarks:

RoboTwin

conda activate r3d && bash scripts/train_robotwin2_single.sh r3d_robotwin2 place_shoe 0000 0 0
# To enable DDP multi-GPU training, change GPU ID from single like "0" to multiple like "0,1"
conda activate r3d && bash scripts/train_robotwin2_single.sh r3d_robotwin2 place_shoe 0000 0 0,1

ManiSkill

conda activate r3d_maniskill && bash scripts/train_maniskill_single.sh r3d_maniskill PickCube 0000 0 0
# To enable DDP multi-GPU training, change GPU ID from single like "0" to multiple like "0,1"
conda activate r3d_maniskill && bash scripts/train_maniskill_single.sh r3d_maniskill PickCube 0000 0 0,1

Citation

@article{hong2024r3d,
  title={R3D: Revisiting 3D Policy Learning},
  author={Zhengdong Hong and Shenrui Wu and Haozhe Cui and Boyi Zhao and Ran Ji and Yiyang He and Hangxing Zhang and Zundong Ke and Jun Wang and Guofeng Zhang and Jiayuan Gu},
  journal={arXiv preprint arXiv:2604.15281},
  year={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
13.9M params
Tensor type
F32
·
Video Preview
loading

Paper for eddie-cui/r3d-weights