Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning Paper • 2504.15275 • Published Apr 21 • 1
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining Paper • 2410.00564 • Published Oct 1, 2024 • 1