zjunlp
/

KnowRL-DeepSeek-R1-Distill-Qwen-7B

Safetensors

qwen2

Model card Files Files and versions Community

Add library_name and pipeline_tag

by nielsr HF Staff - opened about 17 hours ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

-5

Files changed (1) hide show

README.md +6 -5

README.md CHANGED Viewed

@@ -1,6 +1,9 @@
 ---
 license: mit
 ---
 <div align="center">
 <h1 align="center"> KnowRL </h1>
 <h3 align="center"> Exploring Knowledgeable Reinforcement Learning for Factuality </h3>
@@ -62,8 +65,7 @@ print(response)
 ### Using `huggingface-cli`
 You can also download the model from the command line using `huggingface-cli`.
-```
-bash
 huggingface-cli download zjunlp/KnowRL-DeepSeek-R1-Distill-Qwen-7B --local-dir KnowRL-DeepSeek-R1-Distill-Qwen-7B
 ```
@@ -72,7 +74,7 @@ huggingface-cli download zjunlp/KnowRL-DeepSeek-R1-Distill-Qwen-7B --local-dir K
 The model's training process involves two distinct stages, using the data from the `zjunlp/KnowRL-Train-Data` dataset.
 * **Stage 1: Cold-Start SFT**: The base model undergoes supervised fine-tuning on the `knowrl_coldstart.json` dataset. This stage helps the model adopt a fact-based, slow-thinking response structure.
-* **Stage 2: Knowledgeable RL**: The SFT-tuned model is further trained using reinforcement learning (GRPO). The reward function combines a correctness reward with a factuality reward, which is calculated by verifying the model's thinking process against an external knowledge base. This stage uses the `KnowRL_RLtrain_data_withknowledge.json` and `knowrl_RLdata.json` files.
 For complete details on the training configuration and hyperparameters, please refer to our [GitHub repository](https://github.com/zjunlp/KnowRL).
@@ -87,5 +89,4 @@ bibtex
   author={Ren, Baochang and Qiao, Shuofei and Yu, Wenhao and Chen, Huajun and Zhang, Ningyu},
   journal={arXiv preprint arXiv:2506.19807},
   year={2025}
-}
-```

 ---
 license: mit
+library_name: transformers
+pipeline_tag: text-generation
 ---
 <div align="center">
 <h1 align="center"> KnowRL </h1>
 <h3 align="center"> Exploring Knowledgeable Reinforcement Learning for Factuality </h3>
 ### Using `huggingface-cli`
 You can also download the model from the command line using `huggingface-cli`.
+```bash
 huggingface-cli download zjunlp/KnowRL-DeepSeek-R1-Distill-Qwen-7B --local-dir KnowRL-DeepSeek-R1-Distill-Qwen-7B
 ```
 The model's training process involves two distinct stages, using the data from the `zjunlp/KnowRL-Train-Data` dataset.
 * **Stage 1: Cold-Start SFT**: The base model undergoes supervised fine-tuning on the `knowrl_coldstart.json` dataset. This stage helps the model adopt a fact-based, slow-thinking response structure.
+* **Stage 2: Knowledgeable Reinforcement Learning (RL)**: The SFT-tuned model is further trained using reinforcement learning (GRPO). The reward function combines a correctness reward with a factuality reward, which is calculated by verifying the model's thinking process against an external knowledge base. This stage uses the `KnowRL_RLtrain_data_withknowledge.json` and `knowrl_RLdata.json` files.
 For complete details on the training configuration and hyperparameters, please refer to our [GitHub repository](https://github.com/zjunlp/KnowRL).
   author={Ren, Baochang and Qiao, Shuofei and Yu, Wenhao and Chen, Huajun and Zhang, Ningyu},
   journal={arXiv preprint arXiv:2506.19807},
   year={2025}
+}