Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning Paper • 2310.19308 • Published Oct 30, 2023
Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes Paper • 2210.11604 • Published Oct 20, 2022
Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques Paper • 2409.00717 • Published Sep 1, 2024
Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback Paper • 2503.08942 • Published Mar 11