Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning Paper • 2508.08221 • Published 16 days ago • 42
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published 20 days ago • 165
panda-gym: Open-source goal-conditioned environments for robotic learning Paper • 2106.13687 • Published Jun 25, 2021 • 3
Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning Paper • 2402.03046 • Published Feb 5, 2024 • 7
Distributional Preference Alignment of LLMs via Optimal Transport Paper • 2406.05882 • Published Jun 9, 2024 • 2
view article Article Accelerate ND-Parallel: A Guide to Efficient Multi-GPU Training By siro1 and 4 others • 20 days ago • 53
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models Paper • 2508.06471 • Published 19 days ago • 161
EMA Without the Lag: Bias-Corrected Iterate Averaging Schemes Paper • 2508.00180 • Published 27 days ago • 1
view article Article Vision Language Model Alignment in TRL ⚡️ By sergiopaniego and 4 others • 21 days ago • 75
gpt-oss Collection Open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. • 2 items • Updated 20 days ago • 322
view article Article Welcome GPT OSS, the new open-source model family from OpenAI! By reach-vb and 11 others • 23 days ago • 477
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face By abidlabs and 4 others • 30 days ago • 161
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Paper • 2411.10442 • Published Nov 15, 2024 • 87
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 177