Deep Research agents are quickly becoming our daily co-workers — built for complex investigations, not just chat. With modular architecture, advanced tool use and real web access, they go far beyond typical AI. While big-name agents get the spotlight, we want to highlight some powerful recent open-source alternatives:
1. DeerFlow -> https://github.com/bytedance/deer-flow A modular multi-agent system combining LMs and tools for automated research and code analysis. It links a coordinator, planner, team of specialized agent, and reporter, and converts reports to speech via Text-to-Speech (TTS)
2. Alita -> https://github.com/CharlesQ9/Alita Uses a single problem-solving module for scalable reasoning through simplicity. It self-evolves by generating and reusing Model Context Protocols (MCPs) from open-source tools to build external capabilities for diverse tasks
3. WebThinker -> https://github.com/RUC-NLPIR/WebThinker Lets reasoning models autonomously search the web and navigate pages. Deep Web Explorer allows interaction with links and follow-up searches. Through a Think-Search-and-Draft process models generate and refine reports in real time. RL training with preference pairs improves the workflow
4. SimpleDeepSearcher -> https://github.com/RUCAIBox/SimpleDeepSearcher A lightweight framework showing that supervised fine-tuning is a real alternative to complex RL, using simulated web interactions and multi-criteria curation to generate high-quality training data
5. AgenticSeek -> https://github.com/Fosowl/agenticSeek A private, on-device assistant that picks the best agent expert for browsing, coding, or planning—no cloud needed. Includes voice input via speech-to-text
6. Suna -> https://github.com/kortix-ai/suna Offers web browsing, file and doc handling, CLI execution, site deployment, and API/service integration—all in one assistant
Subscribe to the Turing Post:https://www.turingpost.com/subscribe Read further ⬇️
Even though the Perplexity scores of the pruned version are 3 times higher, the ARC, HellaSwag, MMLU, Truthful QA and WinoGrande scores are holding remarkably well, considering two layers were removed (5 and 39). This seems to support Xin Men et al conclusions in ShortGPT: Layers in Large Language Models are More Redundant Than You Expect (2403.03853)
Results summary in the model's card and test results in the ./scores directory. Questions/feedback is always welcomed.