DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle Paper • 2512.04324 • Published Dec 3, 2025 • 150
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation Paper • 2509.16198 • Published Sep 19, 2025 • 126
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published Aug 19, 2025 • 118
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents Paper • 2508.13186 • Published Aug 14, 2025 • 19
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? Paper • 2507.12415 • Published Jul 16, 2025 • 42
QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design Paper • 2505.16175 • Published May 22, 2025 • 41
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning Paper • 2505.15966 • Published May 21, 2025 • 53
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection Paper • 2505.07293 • Published May 12, 2025 • 28
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values Paper • 2504.05535 • Published Apr 7, 2025 • 44
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild Paper • 2503.18892 • Published Mar 24, 2025 • 31
Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models Paper • 2501.13629 • Published Jan 23, 2025 • 48