arxiv:2509.03790

What Fundamental Structure in Reward Functions Enables Efficient Sparse-Reward Learning?

Published on Sep 4

Authors:

Abstract

Low-rank structure in reward matrices enables efficient sparse-reward reinforcement learning by reducing sample complexity, and Policy-Aware Matrix Completion (PAMC) leverages this structure to improve sample efficiency.

AI-generated summary

What fundamental properties of reward functions enable efficient sparse-reward reinforcement learning? We address this question through the lens of low-rank structure in reward matrices, showing that such structure induces a sharp transition from exponential to polynomial sample complexity, the first result of this kind for sparse-reward RL. We introduce Policy-Aware Matrix Completion (PAMC), which connects matrix completion theory with reinforcement learning via a new analysis of policy-dependent sampling. Our framework provides: (i) impossibility results for general sparse reward observation, (ii) reward-free representation learning from dynamics, (iii) distribution-free confidence sets via conformal prediction, and (iv) robust completion guarantees that degrade gracefully when low-rank structure is only approximate. Empirically, we conduct a pre-registered evaluation across 100 systematically sampled domains, finding exploitable structure in over half. PAMC improves sample efficiency by factors between 1.6 and 2.1 compared to strong exploration, structured, and representation-learning baselines, while adding only about 20 percent computational overhead.These results establish structural reward learning as a promising new paradigm, with immediate implications for robotics, healthcare, and other safety-critical, sample-expensive applications.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.03790 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.03790 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.03790 in a Space README.md to link it from this page.