Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions Paper • 2503.23278 • Published Mar 30 • 1
MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits Paper • 2504.03767 • Published Apr 2 • 4
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning Paper • 2504.16656 • Published 19 days ago • 55
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme Paper • 2504.02587 • Published Apr 3 • 30
PaperBench: Evaluating AI's Ability to Replicate AI Research Paper • 2504.01848 • Published Apr 2 • 36
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 Paper • 2503.24376 • Published Mar 31 • 38
Expanding RL with Verifiable Rewards Across Diverse Domains Paper • 2503.23829 • Published Mar 31 • 19
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models Paper • 2503.24235 • Published Mar 31 • 54
CLS-RL: Image Classification with Rule-Based Reinforcement Learning Paper • 2503.16188 • Published Mar 20 • 9
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning Paper • 2503.16252 • Published Mar 20 • 27
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published Mar 18 • 126
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization Paper • 2503.12937 • Published Mar 17 • 29
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning Paper • 2503.10291 • Published Mar 13 • 36
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning Paper • 2503.09516 • Published Mar 12 • 30
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond Paper • 2503.10460 • Published Mar 13 • 28
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published Mar 10 • 44