Papers
arxiv:2508.07976

Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL

Published on Aug 11
· Submitted by xssstory on Aug 13
#2 Paper of the day
Authors:
,
,
,
,
,
,
,

Abstract

ASearcher is an open-source project that uses scalable asynchronous RL training to enhance search agents, achieving high performance on QA tasks with long-horizon search capabilities.

AI-generated summary

Recent advancements in LLM-based agents have demonstrated remarkable capabilities in handling complex, knowledge-intensive tasks by integrating external tools. Among diverse choices of tools, search tools play a pivotal role in accessing vast external knowledge. However, open-source agents still fall short of achieving expert-level Search Intelligence, the ability to resolve ambiguous queries, generate precise searches, analyze results, and conduct thorough exploration. Existing approaches fall short in scalability, efficiency, and data quality. For example, small turn limits in existing online RL methods, e.g. <=10, restrict complex strategy learning. This paper introduces ASearcher, an open-source project for large-scale RL training of search agents. Our key contributions include: (1) Scalable fully asynchronous RL training that enables long-horizon search while maintaining high training efficiency. (2) A prompt-based LLM agent that autonomously synthesizes high-quality and challenging QAs, creating a large-scale QA dataset. Through RL training, our prompt-based QwQ-32B agent achieves substantial improvements, with 46.7% and 20.8% Avg@4 gains on xBench and GAIA, respectively. Notably, our agent exhibits extreme long-horizon search, with tool calls exceeding 40 turns and output tokens exceeding 150k during training time. With a simple agent design and no external LLMs, ASearcher-Web-QwQ achieves Avg@4 scores of 42.1 on xBench and 52.8 on GAIA, surpassing existing open-source 32B agents. We open-source our models, training data, and codes in https://github.com/inclusionAI/ASearcher.

Community

Paper submitter
edited 1 day ago

🔍We introduce ASearcher, a search agent trained by end2end RL

Large-scale (up to 128 turns) RL with AReaL unlocks Long-Horizon Agentic Search
(+20.8/+40.6% on GAIA/xBench)

💻Data, Code&Model: https://github.com/inclusionAI/ASearcher

📄Paper: https://arxiv.org/abs/2508.07976


Agentic RL (Turn Limit=128) → High variance in trajectory collection time.
Batch RL waits for slowest trajectory → slow training 💸

AReaL decouples training & trajectory collection →
✅ Near-100% GPU utilization
Substantial speedup!

📉 Fig: Fully Async Training vs. Batch Training


We build a Data Synthesis Agent for auto QA pair generation:

Two key actions for high-quality QA synthesis:

  • Fuzzing (obscure key details)
  • Fact Injection (add external facts)

✅ Rigorous validation ensures QA quality & difficulty.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 5

Browse 5 models citing this paper

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2508.07976 in a Space README.md to link it from this page.

Collections including this paper 1