arxiv:2402.02017

Adaptive Q-Aid for Conditional Supervised Learning in Offline Reinforcement Learning

Published on Feb 3, 2024

Authors:

Jeonghye Kim ,

Abstract

QCS combines RCSL stability with Q-function stitching ability to enhance offline reinforcement learning performance across benchmarks.

AI-generated summary

Offline reinforcement learning (RL) has progressed with return-conditioned supervised learning (RCSL), but its lack of stitching ability remains a limitation. We introduce Q-Aided Conditional Supervised Learning (QCS), which effectively combines the stability of RCSL with the stitching capability of Q-functions. By analyzing Q-function over-generalization, which impairs stable stitching, QCS adaptively integrates Q-aid into RCSL's loss function based on trajectory return. Empirical results show that QCS significantly outperforms RCSL and value-based methods, consistently achieving or exceeding the maximum trajectory returns across diverse offline RL benchmarks.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2402.02017 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2402.02017 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2402.02017 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.