VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in Multi-Agent Environments Paper • 2506.02387 • Published Jun 3 • 57
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning Paper • 2505.24298 • Published May 30 • 26
SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores Paper • 2306.16688 • Published Jun 29, 2023
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study Paper • 2404.10719 • Published Apr 16, 2024 • 6