SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28 • 123
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning Paper • 2504.06958 • Published Apr 9 • 11
ZeroSearch: Incentivize the Search Capability of LLMs without Searching Paper • 2505.04588 • Published May 7 • 65
Improving Editability in Image Generation with Layer-wise Memory Paper • 2505.01079 • Published May 2 • 28
RLVR-World: Training World Models with Reinforcement Learning Paper • 2505.13934 • Published May 20 • 14
s3: You Don't Need That Much Data to Train a Search Agent via RL Paper • 2505.14146 • Published May 20 • 17