arxiv:2505.07819

H^{3}DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning

Published on May 12

· Submitted by

Lyy0725 on May 13

Upvote

Authors:

Yiyang Lu ,

Xianbang Wang ,

Abstract

Visuomotor policy learning has witnessed substantial progress in robotic manipulation, with recent approaches predominantly relying on generative models to model the action distribution. However, these methods often overlook the critical coupling between visual perception and action prediction. In this work, we introduce Triply-Hierarchical Diffusion Policy~(H^{\mathbf{3}DP}), a novel visuomotor learning framework that explicitly incorporates hierarchical structures to strengthen the integration between visual features and action generation. H^{3}DP contains 3 levels of hierarchy: (1) depth-aware input layering that organizes RGB-D observations based on depth information; (2) multi-scale visual representations that encode semantic features at varying levels of granularity; and (3) a hierarchically conditioned diffusion process that aligns the generation of coarse-to-fine actions with corresponding visual features. Extensive experiments demonstrate that H^{3}DP yields a +27.5% average relative improvement over baselines across 44 simulation tasks and achieves superior performance in 4 challenging bimanual real-world manipulation tasks. Project Page: https://lyy-iiis.github.io/h3dp/.

View arXiv page View PDF Project page Add to collection

Community

Lyy0725

Paper author Paper submitter about 21 hours ago

This paper, 'H³DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning,' presents a novel approach using a triply-hierarchical diffusion policy for visuomotor learning tasks. It's a valuable contribution to the fields of robotics.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2505.07819 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2505.07819 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2505.07819 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.