SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward Paper • 2505.17018 • Published 2 days ago • 12
Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning Paper • 2505.14684 • Published 4 days ago • 19
NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification Paper • 2505.16938 • Published 2 days ago • 97
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning Paper • 2505.16410 • Published 3 days ago • 47
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning Paper • 2503.16252 • Published Mar 20 • 27
Understanding and Diagnosing Deep Reinforcement Learning Paper • 2406.16979 • Published Jun 23, 2024 • 9