DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning Paper • 2511.22570 • Published 29 days ago • 79
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models Paper • 2511.16668 • Published Nov 20 • 53
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning Paper • 2509.22601 • Published Sep 26 • 29
From Uniform to Heterogeneous: Tailoring Policy Optimization to Every Token's Nature Paper • 2509.16591 • Published Sep 20 • 2
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2 • 227
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent Paper • 2508.05748 • Published Aug 7 • 141
OpenMathReasoning Collection Models and datasets from "AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset" • 7 items • Updated 3 days ago • 46