TL;DR: Too Long, Do Re-weighting for Effcient LLM Reasoning Compression Paper • 2506.02678 • Published 23 days ago • 5
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs Paper • 2506.14245 • Published 9 days ago • 35
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning Paper • 2506.08989 • Published 16 days ago • 14
R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO Paper • 2505.16673 • Published May 22 • 2
Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning Paper • 2410.14208 • Published Oct 18, 2024 • 3
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis Paper • 2505.13227 • Published May 19 • 45
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization Paper • 2503.12937 • Published Mar 17 • 29
IterPref: Focal Preference Learning for Code Generation via Iterative Debugging Paper • 2503.02783 • Published Mar 4 • 6
Gradient-Mask Tuning Elevates the Upper Limits of LLM Performance Paper • 2406.15330 • Published Jun 21, 2024
Velocitune: A Velocity-based Dynamic Domain Reweighting Method for Continual Pre-training Paper • 2411.14318 • Published Nov 21, 2024
EpiCoder: Encompassing Diversity and Complexity in Code Generation Paper • 2501.04694 • Published Jan 8 • 16
Efficiently Serving LLM Reasoning Programs with Certaindex Paper • 2412.20993 • Published Dec 30, 2024 • 38
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper • 2412.18319 • Published Dec 24, 2024 • 40
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs Paper • 2408.07055 • Published Aug 13, 2024 • 67
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11, 2024 • 51