catchiz commited on
Commit
edc749a
·
verified ·
1 Parent(s): 5f6b5a2

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -0
README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - Qwen/Qwen2.5-7B-Instruct
7
+ ---
8
+ # GAIR/DeepResearcher-7b
9
+
10
+ ## Introduction
11
+
12
+ DeepResearcher is the first comprehensive framework for end-to-end training of LLM-based deep research agents through scaling reinforcement learning (RL) in real-world environments with authentic web search interactions. Our qualitative analysis reveals emergent cognitive behaviors from end-to-end RL training, including the ability to formulate plans, cross-validate information from multiple sources, engage in self-reflection to redirect research, and maintain honesty when unable to find definitive answers.
13
+
14
+ ## Model Details
15
+
16
+ - **License:** Apache 2.0
17
+ - **Model type:** Reinforcement learning-based LLM (Large Language Model).
18
+ - **Language(s):** The model is designed for tasks in English.
19
+ - **Finetuned from model:** The model is built using the Qwen2.5-7B-Instruct architecture .
20
+
21
+ ### Model Description
22
+
23
+ <!-- Provide a longer summary of what this model is. -->
24
+
25
+
26
+ ### Model Sources
27
+
28
+ - **Repository:** [DeepResearcher GitHub](https://github.com/GAIR-NLP/DeepResearcher) .
29
+ - **Paper:** [DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments](https://arxiv.org/abs/2504.03160)
30
+
31
+
32
+ ## How to Get Started with the Model
33
+
34
+ To get started, you can visit the [DeepResearcher repository](https://github.com/GAIR-NLP/DeepResearcher) on GitHub, where the model's code and setup instructions are provided .
35
+
36
+ ## Training Details
37
+
38
+ ### Training Data
39
+
40
+ The model was trained on open-domain question-answering datasets, including:
41
+ - **NaturalQuestions (NQ)**
42
+ - **TriviaQA (TQ)**
43
+ - **HotpotQA**
44
+ - **2Wiki MultiHopQA**
45
+
46
+ ### Training Procedure
47
+
48
+ DeepResearcher was trained using reinforcement learning (RL) with the Group Relative Policy Optimization (GRPO) algorithm. It was tested in both in-domain (NQ, TQ, HotpotQA) and out-of-domain (Musique, Bamboogle, PopQA) settings .
49
+
50
+ ## Evaluation
51
+
52
+ ### Testing Data
53
+
54
+ The model was evaluated on several datasets, including:
55
+ - **NQ (Natural Questions)**
56
+ - **TQ (TriviaQA)**
57
+ - **HotpotQA**
58
+ - **2Wiki**
59
+ - **Musique**
60
+ - **Bamboogle**
61
+ - **PopQA** .
62
+
63
+
64
+ ### Results
65
+
66
+ DeepResearcher outperforms all baseline models, achieving a substantial improvement in task completion across the datasets, particularly in out-of-domain scenarios.
67
+
68
+
69
+ ## Citation
70
+ ```
71
+ @misc{zheng2025deepresearcherscalingdeepresearch,
72
+ title={DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments},
73
+ author={Yuxiang Zheng and Dayuan Fu and Xiangkun Hu and Xiaojie Cai and Lyumanshan Ye and Pengrui Lu and Pengfei Liu},
74
+ year={2025},
75
+ eprint={2504.03160},
76
+ archivePrefix={arXiv},
77
+ primaryClass={cs.AI},
78
+ url={https://arxiv.org/abs/2504.03160},
79
+ }
80
+ ```