PaperBanana: Automating Academic Illustration for AI Scientists Paper β’ 2601.23265 β’ Published Jan 30 β’ 221
The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation Paper β’ 2510.23393 β’ Published Oct 27, 2025 β’ 21
The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management Paper β’ 2508.21433 β’ Published Aug 29, 2025 β’ 7
𦫠PIPer Collection All the resources for our paper "PIPer: On-Device Environment Setup via Online Reinforcement Learning"! ⒠9 items ⒠Updated Oct 1, 2025 ⒠3
PIPer: On-Device Environment Setup via Online Reinforcement Learning Paper β’ 2509.25455 β’ Published Sep 29, 2025 β’ 38
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning Paper β’ 2506.24119 β’ Published Jun 30, 2025 β’ 51
ImageReFL: Balancing Quality and Diversity in Human-Aligned Diffusion Models Paper β’ 2505.22569 β’ Published May 28, 2025 β’ 55
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper β’ 2505.03335 β’ Published May 6, 2025 β’ 191
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents Paper β’ 2505.20411 β’ Published May 26, 2025 β’ 95
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models Paper β’ 2505.22617 β’ Published May 28, 2025 β’ 132
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper β’ 2504.20571 β’ Published Apr 29, 2025 β’ 98
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper β’ 2502.14499 β’ Published Feb 20, 2025 β’ 195
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence Paper β’ 2406.11931 β’ Published Jun 17, 2024 β’ 69
Long Code Arena: a Set of Benchmarks for Long-Context Code Models Paper β’ 2406.11612 β’ Published Jun 17, 2024 β’ 25
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning Paper β’ 2406.08973 β’ Published Jun 13, 2024 β’ 89