100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models Paper • 2505.00551 • Published 8 days ago • 30
Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization Paper • 2502.16825 • Published Feb 24 • 6
Test-time Computing: from System-1 Thinking to System-2 Thinking Paper • 2501.02497 • Published Jan 5 • 46