Safetensors
qwen2
lbourdois commited on
Commit
5659b54
Β·
verified Β·
1 Parent(s): 7d28cca

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +184 -170
README.md CHANGED
@@ -1,171 +1,185 @@
1
- ---
2
- license: mit
3
- datasets:
4
- - RUC-NLPIR/FlashRAG_datasets
5
- base_model:
6
- - Qwen/Qwen2.5-32B
7
- ---
8
- <div align="center">
9
-
10
- # ***ReSearch***: Learning to ***Re***ason with ***Search*** for LLMs via Reinforcement Learning
11
-
12
- [![Arxiv](https://img.shields.io/badge/paper-A82F27?style=for-the-badge&logo=arxiv)](https://arxiv.org/abs/2503.19470)
13
- <!-- [![Model](https://img.shields.io/badge/model-4169E1?style=for-the-badge&logo=huggingface)](https://arxiv.org/abs/2503.19470) -->
14
-
15
- </div>
16
-
17
- <p align="center">
18
- <img src="./assets/intro_bar.png" width="90%" alt="Intro" />
19
- <img src="./assets/method.png" width="90%" alt="Method" />
20
- </p>
21
-
22
- We propose ***ReSearch***, a novel framework that trains LLMs to ***Re***ason with ***Search*** via reinforcement learning without using any supervised data on reasoning steps. Our approach treats search operations as integral components of the reasoning chain, where when and how to perform searches is guided by text-based thinking, and search results subsequently influence further reasoning.
23
-
24
- ## πŸ“° News
25
- - **[2025-03-26]** πŸŽ‰ We release the paper, update the code and open-source the models.
26
- - πŸ“ The **paper is released** on arXiv, more details and evaluation results can be found in our [paper](https://arxiv.org/abs/2503.19470).
27
- - πŸ› οΈ The **repository is updated** with the new implementation, especially the rollout with search during RL training. This version of implementation is based on the latest release of verl.
28
- - **[2025-03-03]** βœ… We have released the preview version of ReSearch implementation.
29
-
30
- ## πŸ“¦ Installation
31
-
32
- We recommend using conda to manage the environment. First create a conda environment and activate it.
33
- ```bash
34
- conda create -n re-search python==3.10
35
- conda activate re-search
36
- ```
37
- Then install dependencies, and our modified verl and flashrag packages under ```src/``` will be installed in the editable mode. Check out ```setup.py``` for details.
38
- ```bash
39
- pip3 install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
40
- pip3 install flash-attn --no-build-isolation
41
- git clone https://github.com/Agent-RL/ReSearch.git
42
- cd ReSearch
43
- pip3 install -e .
44
- ```
45
- As described in the [FlashRAG](https://github.com/RUC-NLPIR/FlashRAG?tab=readme-ov-file#wrench-installation), due to the incompatibility when installing faiss using pip, we need to use the following conda command to install faiss-gpu.
46
- ```bash
47
- conda install -c pytorch -c nvidia faiss-gpu=1.8.0
48
- ```
49
-
50
- ## πŸš€ Quick Start
51
-
52
- ### Retriever Serving
53
-
54
- As described in our paper, during model training and evaluation, search operation will be conducted in the rollout and inference process. In practice, we host a retriever service via FlashRAG and FastAPI. Hence, the search operation is standardized to be an API call. This serving can be used to decouple the search operation from the reinforcement learning process, making the training and evaluation more clear and flexible.
55
-
56
- Before starting the retriever serving, you need download the [pre-indexed wikipedia](https://github.com/RUC-NLPIR/FlashRAG?tab=readme-ov-file#index), [wikipedia corpus and corresponding retriever models](https://github.com/RUC-NLPIR/FlashRAG/blob/main/docs/original_docs/reproduce_experiment.md#preliminary). More details can be found in the documentation of FlashRAG.
57
-
58
- For starting the retriever serving, you need to first fill the `scripts/serving/retriever_config.yaml` with the correct path to the retrieval model, index, and corpus, and available GPU ids. Then, you can run the following command to start the retriever serving:
59
- ```bash
60
- cd scripts/serving
61
- python retriever_serving.py \
62
- --config retriever_config.yaml \
63
- --num_retriever {num_retriever} \
64
- --port {port}
65
- ```
66
-
67
- The started retriever serving will be used in the training and evaluation process in the following part.
68
-
69
- ### Data Preparation
70
-
71
- *ReSearch* is trained on the training set of MuSiQue, and evaluated on the dev set of HotpotQA, 2WikiMultiHopQA, MuSiQue and Bamboogle. For downloading the datasets, please refer to the `data/download_dataset.sh` script.
72
- ```bash
73
- cd data
74
- bash download_dataset.sh
75
- ```
76
-
77
- For preparing the training and validation data for following reinforcement learning, please run this script to parse the MuSiQue dataset to the parquet format.
78
- ```bash
79
- cd data
80
- python prepare_musique.py
81
- ```
82
-
83
- ### Training
84
-
85
- Our training framework is based on [verl](https://github.com/volcengine/verl), a powerful reinforcement learning framework for LLMs. We deeply customize the verl code to fit our needs, and the modified version of verl is under the `src/verl` directory. The example of training scripts are under `scripts/train`.
86
-
87
- #### Single-node training
88
- Here is an example of training Qwen2.5-7B-Instruct with 4 GPUs locally. Note that the training script below **is just an example** for single-node training, using small batch size for quick start, and do not assure the training performance.
89
- ```bash
90
- cd scripts/train
91
- bash train.sh \
92
- --train_batch_size 8 \
93
- --ppo_mini_batch_size 8 \
94
- --apply_chat True \
95
- --prompt_template_name re_search_template_sys \
96
- --actor_model_path {model/path/to/qwen2.5-7b-instruct} \
97
- --search_url {your-hosted-retriever-url} \
98
- --project_name {wandb-project-name} \
99
- --experiment_name {wandb-experiment-name} \
100
- --nnodes 1 \
101
- --n_gpus_per_node 4 \
102
- --save_freq 5 \
103
- --test_freq 5 \
104
- --total_epochs 2 \
105
- --wandb_api_key {your-wandb-api-key} \
106
- --save_path {path/to/save} \
107
- --train_files {path/to/train/parquet/data} \
108
- --test_files {path/to/test/parquet/data}
109
- ```
110
- - For training base (pre-trained) models, please use `--apply_chat False` and `--prompt_template_name re_search_template`
111
- - For training instruction-tuned models, please use `--apply_chat True` and `--prompt_template_name re_search_template_sys`
112
-
113
- #### Multi-node training
114
-
115
- If you want to **fully reproduce** the results in our paper, please refer to the multi-node training script in `scripts/train/train_multi_node.sh`, as well as the implementation details in our paper.
116
-
117
- ### Evaluation
118
-
119
- We recommend using [SGLang](https://docs.sglang.ai/) to serve the trained model. You can download our open-sourced models or trained your own models to conduct the evaluation. Here is an example of launching the model serving:
120
- ```bash
121
- python3 -m sglang.launch_server \
122
- --served-model-name {trained/model/name} \
123
- --model-path {trained/model/path} \
124
- --tp 2 \
125
- --context-length 8192 \
126
- --enable-metrics \
127
- --dtype bfloat16 \
128
- --host 0.0.0.0 \
129
- --port 80 \
130
- --trust-remote-code \
131
- --disable-overlap \
132
- --disable-radix-cache
133
- ```
134
-
135
- We use [FlashRAG](https://github.com/RUC-NLPIR/FlashRAG) as the standard evaluation environment. Here is an example of evaluating the performance of ReSearch-Qwen-7B-Instruct on Bamboogle test set.
136
- ```bash
137
- cd scripts/evaluation
138
- python run_eval.py \
139
- --config_path eval_config.yaml \
140
- --method_name research \
141
- --data_dir {root/path/to/evaluation/data} \
142
- --dataset_name bamboogle \
143
- --split test \
144
- --save_dir {your-save-dir} \
145
- --save_note research_qwen7b_ins
146
- --sgl_remote_url {your-launched-sgl-url} \
147
- --remote_retriever_url {your-hosted-retriever-url} \
148
- --generator_model {your-local-model-path} \
149
- --apply_chat True
150
- ```
151
-
152
- For base model, please use `--apply_chat False` and for instruction-tuned model, please use `--apply_chat True`, for loading correct prompt template when conducting evaluation for *ReSearch* model. For more details about the configuration, please refer to the `scripts/evaluation/eval_config.yaml` file.
153
-
154
- ## 🀝 Acknowledge
155
-
156
- This training implementation is based on [verl](https://github.com/volcengine/verl) and the evaluation is based on [FlashRAG](https://github.com/RUC-NLPIR/FlashRAG). The serving of retriever is based on [FastAPI](https://github.com/fastapi/fastapi). The model serving is based on [SGLang](https://docs.sglang.ai/). *ReSearch* models are trained based on [Qwen2.5](https://qwenlm.github.io/blog/qwen2.5/). We sincerely appreciate their contributions to the open-source community.
157
-
158
- ## πŸ“š Citation
159
-
160
- If you find this work useful, please cite it as follows:
161
- ```bibtex
162
- @misc{chen2025research
163
- title={ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning},
164
- author={Mingyang Chen and Tianpeng Li and Haoze Sun and Yijie Zhou and Chenzheng Zhu and Haofen Wang and Jeff Z. Pan and Wen Zhang and Huajun Chen and Fan Yang and Zenan Zhou and Weipeng Chen},
165
- year={2025},
166
- eprint={2503.19470},
167
- archivePrefix={arXiv},
168
- primaryClass={cs.AI},
169
- url={https://arxiv.org/abs/2503.19470},
170
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
171
  ```
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - RUC-NLPIR/FlashRAG_datasets
5
+ base_model:
6
+ - Qwen/Qwen2.5-32B
7
+ language:
8
+ - zho
9
+ - eng
10
+ - fra
11
+ - spa
12
+ - por
13
+ - deu
14
+ - ita
15
+ - rus
16
+ - jpn
17
+ - kor
18
+ - vie
19
+ - tha
20
+ - ara
21
+ ---
22
+ <div align="center">
23
+
24
+ # ***ReSearch***: Learning to ***Re***ason with ***Search*** for LLMs via Reinforcement Learning
25
+
26
+ [![Arxiv](https://img.shields.io/badge/paper-A82F27?style=for-the-badge&logo=arxiv)](https://arxiv.org/abs/2503.19470)
27
+ <!-- [![Model](https://img.shields.io/badge/model-4169E1?style=for-the-badge&logo=huggingface)](https://arxiv.org/abs/2503.19470) -->
28
+
29
+ </div>
30
+
31
+ <p align="center">
32
+ <img src="./assets/intro_bar.png" width="90%" alt="Intro" />
33
+ <img src="./assets/method.png" width="90%" alt="Method" />
34
+ </p>
35
+
36
+ We propose ***ReSearch***, a novel framework that trains LLMs to ***Re***ason with ***Search*** via reinforcement learning without using any supervised data on reasoning steps. Our approach treats search operations as integral components of the reasoning chain, where when and how to perform searches is guided by text-based thinking, and search results subsequently influence further reasoning.
37
+
38
+ ## πŸ“° News
39
+ - **[2025-03-26]** πŸŽ‰ We release the paper, update the code and open-source the models.
40
+ - πŸ“ The **paper is released** on arXiv, more details and evaluation results can be found in our [paper](https://arxiv.org/abs/2503.19470).
41
+ - πŸ› οΈ The **repository is updated** with the new implementation, especially the rollout with search during RL training. This version of implementation is based on the latest release of verl.
42
+ - **[2025-03-03]** βœ… We have released the preview version of ReSearch implementation.
43
+
44
+ ## πŸ“¦ Installation
45
+
46
+ We recommend using conda to manage the environment. First create a conda environment and activate it.
47
+ ```bash
48
+ conda create -n re-search python==3.10
49
+ conda activate re-search
50
+ ```
51
+ Then install dependencies, and our modified verl and flashrag packages under ```src/``` will be installed in the editable mode. Check out ```setup.py``` for details.
52
+ ```bash
53
+ pip3 install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
54
+ pip3 install flash-attn --no-build-isolation
55
+ git clone https://github.com/Agent-RL/ReSearch.git
56
+ cd ReSearch
57
+ pip3 install -e .
58
+ ```
59
+ As described in the [FlashRAG](https://github.com/RUC-NLPIR/FlashRAG?tab=readme-ov-file#wrench-installation), due to the incompatibility when installing faiss using pip, we need to use the following conda command to install faiss-gpu.
60
+ ```bash
61
+ conda install -c pytorch -c nvidia faiss-gpu=1.8.0
62
+ ```
63
+
64
+ ## πŸš€ Quick Start
65
+
66
+ ### Retriever Serving
67
+
68
+ As described in our paper, during model training and evaluation, search operation will be conducted in the rollout and inference process. In practice, we host a retriever service via FlashRAG and FastAPI. Hence, the search operation is standardized to be an API call. This serving can be used to decouple the search operation from the reinforcement learning process, making the training and evaluation more clear and flexible.
69
+
70
+ Before starting the retriever serving, you need download the [pre-indexed wikipedia](https://github.com/RUC-NLPIR/FlashRAG?tab=readme-ov-file#index), [wikipedia corpus and corresponding retriever models](https://github.com/RUC-NLPIR/FlashRAG/blob/main/docs/original_docs/reproduce_experiment.md#preliminary). More details can be found in the documentation of FlashRAG.
71
+
72
+ For starting the retriever serving, you need to first fill the `scripts/serving/retriever_config.yaml` with the correct path to the retrieval model, index, and corpus, and available GPU ids. Then, you can run the following command to start the retriever serving:
73
+ ```bash
74
+ cd scripts/serving
75
+ python retriever_serving.py \
76
+ --config retriever_config.yaml \
77
+ --num_retriever {num_retriever} \
78
+ --port {port}
79
+ ```
80
+
81
+ The started retriever serving will be used in the training and evaluation process in the following part.
82
+
83
+ ### Data Preparation
84
+
85
+ *ReSearch* is trained on the training set of MuSiQue, and evaluated on the dev set of HotpotQA, 2WikiMultiHopQA, MuSiQue and Bamboogle. For downloading the datasets, please refer to the `data/download_dataset.sh` script.
86
+ ```bash
87
+ cd data
88
+ bash download_dataset.sh
89
+ ```
90
+
91
+ For preparing the training and validation data for following reinforcement learning, please run this script to parse the MuSiQue dataset to the parquet format.
92
+ ```bash
93
+ cd data
94
+ python prepare_musique.py
95
+ ```
96
+
97
+ ### Training
98
+
99
+ Our training framework is based on [verl](https://github.com/volcengine/verl), a powerful reinforcement learning framework for LLMs. We deeply customize the verl code to fit our needs, and the modified version of verl is under the `src/verl` directory. The example of training scripts are under `scripts/train`.
100
+
101
+ #### Single-node training
102
+ Here is an example of training Qwen2.5-7B-Instruct with 4 GPUs locally. Note that the training script below **is just an example** for single-node training, using small batch size for quick start, and do not assure the training performance.
103
+ ```bash
104
+ cd scripts/train
105
+ bash train.sh \
106
+ --train_batch_size 8 \
107
+ --ppo_mini_batch_size 8 \
108
+ --apply_chat True \
109
+ --prompt_template_name re_search_template_sys \
110
+ --actor_model_path {model/path/to/qwen2.5-7b-instruct} \
111
+ --search_url {your-hosted-retriever-url} \
112
+ --project_name {wandb-project-name} \
113
+ --experiment_name {wandb-experiment-name} \
114
+ --nnodes 1 \
115
+ --n_gpus_per_node 4 \
116
+ --save_freq 5 \
117
+ --test_freq 5 \
118
+ --total_epochs 2 \
119
+ --wandb_api_key {your-wandb-api-key} \
120
+ --save_path {path/to/save} \
121
+ --train_files {path/to/train/parquet/data} \
122
+ --test_files {path/to/test/parquet/data}
123
+ ```
124
+ - For training base (pre-trained) models, please use `--apply_chat False` and `--prompt_template_name re_search_template`
125
+ - For training instruction-tuned models, please use `--apply_chat True` and `--prompt_template_name re_search_template_sys`
126
+
127
+ #### Multi-node training
128
+
129
+ If you want to **fully reproduce** the results in our paper, please refer to the multi-node training script in `scripts/train/train_multi_node.sh`, as well as the implementation details in our paper.
130
+
131
+ ### Evaluation
132
+
133
+ We recommend using [SGLang](https://docs.sglang.ai/) to serve the trained model. You can download our open-sourced models or trained your own models to conduct the evaluation. Here is an example of launching the model serving:
134
+ ```bash
135
+ python3 -m sglang.launch_server \
136
+ --served-model-name {trained/model/name} \
137
+ --model-path {trained/model/path} \
138
+ --tp 2 \
139
+ --context-length 8192 \
140
+ --enable-metrics \
141
+ --dtype bfloat16 \
142
+ --host 0.0.0.0 \
143
+ --port 80 \
144
+ --trust-remote-code \
145
+ --disable-overlap \
146
+ --disable-radix-cache
147
+ ```
148
+
149
+ We use [FlashRAG](https://github.com/RUC-NLPIR/FlashRAG) as the standard evaluation environment. Here is an example of evaluating the performance of ReSearch-Qwen-7B-Instruct on Bamboogle test set.
150
+ ```bash
151
+ cd scripts/evaluation
152
+ python run_eval.py \
153
+ --config_path eval_config.yaml \
154
+ --method_name research \
155
+ --data_dir {root/path/to/evaluation/data} \
156
+ --dataset_name bamboogle \
157
+ --split test \
158
+ --save_dir {your-save-dir} \
159
+ --save_note research_qwen7b_ins
160
+ --sgl_remote_url {your-launched-sgl-url} \
161
+ --remote_retriever_url {your-hosted-retriever-url} \
162
+ --generator_model {your-local-model-path} \
163
+ --apply_chat True
164
+ ```
165
+
166
+ For base model, please use `--apply_chat False` and for instruction-tuned model, please use `--apply_chat True`, for loading correct prompt template when conducting evaluation for *ReSearch* model. For more details about the configuration, please refer to the `scripts/evaluation/eval_config.yaml` file.
167
+
168
+ ## 🀝 Acknowledge
169
+
170
+ This training implementation is based on [verl](https://github.com/volcengine/verl) and the evaluation is based on [FlashRAG](https://github.com/RUC-NLPIR/FlashRAG). The serving of retriever is based on [FastAPI](https://github.com/fastapi/fastapi). The model serving is based on [SGLang](https://docs.sglang.ai/). *ReSearch* models are trained based on [Qwen2.5](https://qwenlm.github.io/blog/qwen2.5/). We sincerely appreciate their contributions to the open-source community.
171
+
172
+ ## πŸ“š Citation
173
+
174
+ If you find this work useful, please cite it as follows:
175
+ ```bibtex
176
+ @misc{chen2025research
177
+ title={ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning},
178
+ author={Mingyang Chen and Tianpeng Li and Haoze Sun and Yijie Zhou and Chenzheng Zhu and Haofen Wang and Jeff Z. Pan and Wen Zhang and Huajun Chen and Fan Yang and Zenan Zhou and Weipeng Chen},
179
+ year={2025},
180
+ eprint={2503.19470},
181
+ archivePrefix={arXiv},
182
+ primaryClass={cs.AI},
183
+ url={https://arxiv.org/abs/2503.19470},
184
+ }
185
  ```