Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training Paper • 2505.14681 • Published May 20, 2025 • 10
The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement Paper • 2503.16024 • Published Mar 20, 2025 • 1
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning Paper • 2505.23754 • Published May 29, 2025 • 15
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards Paper • 2505.13445 • Published May 19, 2025
RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents Paper • 2507.03112 • Published Jul 3, 2025 • 32
CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards Paper • 2507.17147 • Published Jul 23, 2025
The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models Paper • 2503.02875 • Published Mar 4, 2025 • 1
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models Paper • 2509.09675 • Published Sep 11, 2025 • 28
VISTA: Enhancing Vision-Text Alignment in MLLMs via Cross-Modal Mutual Information Maximization Paper • 2505.10917 • Published May 16, 2025
BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs Paper • 2509.26514 • Published Sep 30, 2025 • 3
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains Paper • 2511.04962 • Published Nov 7, 2025 • 53
SCAN: Self-Denoising Monte Carlo Annotation for Robust Process Reward Learning Paper • 2509.16548 • Published Sep 20, 2025
Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms Paper • 2505.20322 • Published May 23, 2025 • 14
CoAct: A Global-Local Hierarchy for Autonomous Agent Collaboration Paper • 2406.13381 • Published Jun 19, 2024
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs Paper • 2412.21187 • Published Dec 30, 2024 • 40
Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls Paper • 2502.11183 • Published Feb 16, 2025
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning Paper • 2504.11456 • Published Apr 15, 2025 • 12
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models Paper • 2505.02847 • Published May 1, 2025 • 28
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning Paper • 2504.19162 • Published Apr 27, 2025 • 18