Add library_name and pipeline_tag
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
@@ -1,6 +1,9 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
|
|
3 |
---
|
|
|
4 |
<div align="center">
|
5 |
<h1 align="center"> KnowRL </h1>
|
6 |
<h3 align="center"> Exploring Knowledgeable Reinforcement Learning for Factuality </h3>
|
@@ -62,8 +65,7 @@ print(response)
|
|
62 |
### Using `huggingface-cli`
|
63 |
You can also download the model from the command line using `huggingface-cli`.
|
64 |
|
65 |
-
```
|
66 |
-
bash
|
67 |
huggingface-cli download zjunlp/KnowRL-DeepSeek-R1-Distill-Qwen-7B --local-dir KnowRL-DeepSeek-R1-Distill-Qwen-7B
|
68 |
```
|
69 |
|
@@ -72,7 +74,7 @@ huggingface-cli download zjunlp/KnowRL-DeepSeek-R1-Distill-Qwen-7B --local-dir K
|
|
72 |
The model's training process involves two distinct stages, using the data from the `zjunlp/KnowRL-Train-Data` dataset.
|
73 |
|
74 |
* **Stage 1: Cold-Start SFT**: The base model undergoes supervised fine-tuning on the `knowrl_coldstart.json` dataset. This stage helps the model adopt a fact-based, slow-thinking response structure.
|
75 |
-
* **Stage 2: Knowledgeable RL**: The SFT-tuned model is further trained using reinforcement learning (GRPO). The reward function combines a correctness reward with a factuality reward, which is calculated by verifying the model's thinking process against an external knowledge base. This stage uses the `KnowRL_RLtrain_data_withknowledge.json` and `knowrl_RLdata.json` files.
|
76 |
|
77 |
For complete details on the training configuration and hyperparameters, please refer to our [GitHub repository](https://github.com/zjunlp/KnowRL).
|
78 |
|
@@ -87,5 +89,4 @@ bibtex
|
|
87 |
author={Ren, Baochang and Qiao, Shuofei and Yu, Wenhao and Chen, Huajun and Zhang, Ningyu},
|
88 |
journal={arXiv preprint arXiv:2506.19807},
|
89 |
year={2025}
|
90 |
-
}
|
91 |
-
```
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
library_name: transformers
|
4 |
+
pipeline_tag: text-generation
|
5 |
---
|
6 |
+
|
7 |
<div align="center">
|
8 |
<h1 align="center"> KnowRL </h1>
|
9 |
<h3 align="center"> Exploring Knowledgeable Reinforcement Learning for Factuality </h3>
|
|
|
65 |
### Using `huggingface-cli`
|
66 |
You can also download the model from the command line using `huggingface-cli`.
|
67 |
|
68 |
+
```bash
|
|
|
69 |
huggingface-cli download zjunlp/KnowRL-DeepSeek-R1-Distill-Qwen-7B --local-dir KnowRL-DeepSeek-R1-Distill-Qwen-7B
|
70 |
```
|
71 |
|
|
|
74 |
The model's training process involves two distinct stages, using the data from the `zjunlp/KnowRL-Train-Data` dataset.
|
75 |
|
76 |
* **Stage 1: Cold-Start SFT**: The base model undergoes supervised fine-tuning on the `knowrl_coldstart.json` dataset. This stage helps the model adopt a fact-based, slow-thinking response structure.
|
77 |
+
* **Stage 2: Knowledgeable Reinforcement Learning (RL)**: The SFT-tuned model is further trained using reinforcement learning (GRPO). The reward function combines a correctness reward with a factuality reward, which is calculated by verifying the model's thinking process against an external knowledge base. This stage uses the `KnowRL_RLtrain_data_withknowledge.json` and `knowrl_RLdata.json` files.
|
78 |
|
79 |
For complete details on the training configuration and hyperparameters, please refer to our [GitHub repository](https://github.com/zjunlp/KnowRL).
|
80 |
|
|
|
89 |
author={Ren, Baochang and Qiao, Shuofei and Yu, Wenhao and Chen, Huajun and Zhang, Ningyu},
|
90 |
journal={arXiv preprint arXiv:2506.19807},
|
91 |
year={2025}
|
92 |
+
}
|
|