Abstract
InfoSeek is a scalable framework for generating complex Deep Research tasks by synthesizing hierarchical constraint satisfaction problems, enabling models to outperform larger baselines on challenging benchmarks.
Large language models (LLMs) are increasingly expected to go beyond simple factual queries toward Deep Research-tasks that require decomposing questions into sub-problems, coordinating multi-step reasoning, and synthesizing evidence from diverse sources. We formalize Deep Research tasks with verifiable answers as Hierarchical Constraint Satisfaction Problems (HCSPs), which are fundamentally different from single-constraint, multi-hop, or flat CSP formulations. However, existing benchmarks (e.g., Natural Questions, HotpotQA) fail to capture this complexity, while recent synthetic datasets often introduce shortcut reasoning, knowledge leakage, or lack sufficient structural depth. To address this gap, we introduce InfoSeek, a scalable framework for synthesizing complex Deep Research tasks. InfoSeek uses a dual-agent system to recursively build a Research Tree from large-scale webpages, blurring intermediate nodes into valid sub-problems, and converting these trees into natural language questions that require traversing the full hierarchy. It also enables rapid scaling, yielding over 50K training examples, a curated test set, and reasoning trajectories generated via reject sampling. Experiments show that models trained on InfoSeek consistently outperform strong baselines. On a challenging benchmark BrowseComp-Plus, 3B LLMs optimized with InfoSeek surpass much larger 32B models and lightweight commercial APIs (e.g., Gemini2.5-Flash), while achieving performance comparable to stronger APIs (e.g., Gemini2.5-Pro). By preserving meta-information such as intermediate steps and retrieval labels, InfoSeek further supports advanced optimization strategies, including compound reward design and trajectory-level exploration. We provide our codes and datasets in https://github.com/VectorSpaceLab/InfoSeek{this repository}.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- A Survey of LLM-based Deep Search Agents: Paradigm, Optimization, Evaluation, and Challenges (2025)
- Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs (2025)
- Hybrid Deep Searcher: Integrating Parallel and Sequential Search Reasoning (2025)
- WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization (2025)
- EviNote-RAG: Enhancing RAG Models via Answer-Supportive Evidence Notes (2025)
- GRAIL:Learning to Interact with Large Knowledge Graphs for Retrieval Augmented Reasoning (2025)
- Multimodal Iterative RAG for Knowledge Visual Question Answering (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper