Parallel-R1: Towards Parallel Thinking via Reinforcement Learning Paper • 2509.07980 • Published 3 days ago • 84
GRAM-R^2: Self-Training Generative Foundation Reward Models for Reward Reasoning Paper • 2509.02492 • Published 10 days ago • 1
GRAM: A Generative Foundation Reward Model for Reward Generalization Paper • 2506.14175 • Published Jun 17 • 1
GRAM Collection Generative Foundation Reward Models for Reward Generalization • 8 items • Updated Jun 19 • 1
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data Paper • 2408.12109 • Published Aug 22, 2024 • 1
A Controlled Study on Long Context Extension and Generalization in LLMs Paper • 2409.12181 • Published Sep 18, 2024 • 45