Shortcut Learning in Generalist Robot Policies: The Role of Dataset Diversity and Fragmentation
Abstract
Shortcut learning in generalist robot policies trained on large-scale datasets limits generalization, and this can be mitigated through improved dataset collection and data augmentation strategies.
Generalist robot policies trained on large-scale datasets such as Open X-Embodiment (OXE) demonstrate strong performance across a wide range of tasks. However, they often struggle to generalize beyond the distribution of their training data. In this paper, we investigate the underlying cause of this limited generalization capability. We identify shortcut learning -- the reliance on task-irrelevant features -- as a key impediment to generalization. Through comprehensive theoretical and empirical analysis, we uncover two primary contributors to shortcut learning: (1) limited diversity within individual sub-datasets, and (2) significant distributional disparities across sub-datasets, leading to dataset fragmentation. These issues arise from the inherent structure of large-scale datasets like OXE, which are typically composed of multiple sub-datasets collected independently across varied environments and embodiments. Our findings provide critical insights into dataset collection strategies that can reduce shortcut learning and enhance the generalization ability of generalist robot policies. Moreover, in scenarios where acquiring new large-scale data is impractical, we demonstrate that carefully selected robotic data augmentation strategies can effectively reduce shortcut learning in existing offline datasets, thereby improving generalization capabilities of generalist robot policies, e.g., pi_0, in both simulation and real-world environments. More information at https://lucky-light-sun.github.io/proj/shortcut-learning-in-grps/.
Community
[CoRL 2025] Why do generalist robot policies often struggle to generalize beyond their training data? And how can this insight inform how we collect robot datasets? We dive into these questions in our latest work!
Project page: https://lucky-light-sun.github.io/proj/shortcut-learning-in-grps/
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Is Diversity All You Need for Scalable Robotic Manipulation? (2025)
- CUPID: Curating Data your Robot Loves with Influence Functions (2025)
- H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation (2025)
- ControlVLA: Few-shot Object-centric Adaptation for Pre-trained Vision-Language-Action Models (2025)
- CLASS: Contrastive Learning via Action Sequence Supervision for Robot Manipulation (2025)
- Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos (2025)
- EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper