-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 67 -
Understanding and Diagnosing Deep Reinforcement Learning
Paper • 2406.16979 • Published • 9 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 60 -
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Paper • 2407.00617 • Published • 7
Collections
Discover the best community collections!
Collections including paper arxiv:2406.16979
-
WPO: Enhancing RLHF with Weighted Preference Optimization
Paper • 2406.11827 • Published • 14 -
Self-Improving Robust Preference Optimization
Paper • 2406.01660 • Published • 18 -
Bootstrapping Language Models with DPO Implicit Rewards
Paper • 2406.09760 • Published • 38 -
BPO: Supercharging Online Preference Learning by Adhering to the Proximity of Behavior LLM
Paper • 2406.12168 • Published • 7
-
In deep reinforcement learning, a pruned network is a good network
Paper • 2402.12479 • Published • 17 -
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Paper • 2403.03950 • Published • 13 -
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 67 -
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper • 2405.11143 • Published • 33
-
Diffusion World Model
Paper • 2402.03570 • Published • 7 -
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Paper • 2401.16335 • Published • 1 -
Towards Efficient and Exact Optimization of Language Model Alignment
Paper • 2402.00856 • Published -
ODIN: Disentangled Reward Mitigates Hacking in RLHF
Paper • 2402.07319 • Published • 13
-
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper • 2401.06080 • Published • 25 -
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms
Paper • 2406.02900 • Published • 10 -
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Paper • 2406.04151 • Published • 17 -
Understanding and Diagnosing Deep Reinforcement Learning
Paper • 2406.16979 • Published • 9