InnoGym: Benchmarking the Innovation Potential of AI Agents Paper • 2512.01822 • Published Dec 1, 2025 • 35 • 2
LightMem: Lightweight and Efficient Memory-Augmented Generation Paper • 2510.18866 • Published Oct 21, 2025 • 111 • 3
Executable Knowledge Graphs for Replicating AI Research Paper • 2510.17795 • Published Oct 20, 2025 • 14 • 2
OceanGym: A Benchmark Environment for Underwater Embodied Agents Paper • 2509.26536 • Published Sep 30, 2025 • 34 • 2
Towards Personalized Deep Research: Benchmarks and Evaluations Paper • 2509.25106 • Published Sep 29, 2025 • 29 • 1
Automating Steering for Safe Multimodal Large Language Models Paper • 2507.13255 • Published Jul 17, 2025 • 3 • 1
ReCode: Updating Code API Knowledge with Reinforcement Learning Paper • 2506.20495 • Published Jun 25, 2025 • 9 • 1
KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality Paper • 2506.19807 • Published Jun 24, 2025 • 7 • 1
Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study Paper • 2506.19794 • Published Jun 24, 2025 • 8 • 1
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science Paper • 2506.10974 • Published Jun 12, 2025 • 19 • 2
ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark Paper • 2506.10960 • Published Jun 12, 2025 • 12 • 2
Spatial Knowledge Graph-Guided Multimodal Synthesis Paper • 2505.22633 • Published May 28, 2025 • 4 • 1
Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms Paper • 2505.20322 • Published May 23, 2025 • 14 • 2
Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training Paper • 2505.14681 • Published May 20, 2025 • 10 • 2
Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey Paper • 2505.03418 • Published May 6, 2025 • 9 • 1