Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published 6 days ago • 42
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis Paper • 2503.23145 • Published Mar 29 • 36
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving Paper • 2502.20238 • Published Feb 27 • 24