Training Data Efficiency in Multimodal Process Reward Models Paper • 2602.04145 • Published 6 days ago • 74
TTCS: Test-Time Curriculum Synthesis for Self-Evolving Paper • 2601.22628 • Published 10 days ago • 33
Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing Paper • 2602.03845 • Published 6 days ago • 24
TTCS: Test-Time Curriculum Synthesis for Self-Evolving Paper • 2601.22628 • Published 10 days ago • 33
Guided Self-Evolving LLMs with Minimal Human Supervision Paper • 2512.02472 • Published Dec 2, 2025 • 54
VisPlay: Self-Evolving Vision-Language Models from Images Paper • 2511.15661 • Published Nov 19, 2025 • 43
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning Paper • 2509.07980 • Published Sep 9, 2025 • 103
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published Sep 2, 2025 • 84
Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published Aug 27, 2025 • 84
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? Paper • 2507.12415 • Published Jul 16, 2025 • 43
AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs Paper • 2507.05687 • Published Jul 8, 2025 • 30