|
--- |
|
license: mit |
|
--- |
|
<div align="center"> |
|
<h1 align="center"> KnowRL </h1> |
|
<h3 align="center"> Exploring Knowledgeable Reinforcement Learning for Factuality </h3> |
|
|
|
<p align="center"> |
|
<a href="https://arxiv.org/abs/2506.19807">📄arXiv</a> • |
|
<a href="https://github.com/zjunlp/KnowRL">💻GitHub Repo</a> • |
|
<a href="https://huggingface.co/datasets/zjunlp/KnowRL-Train-Data">📖Dataset</a> |
|
</p> |
|
|
|
</div> |
|
|
|
--- |
|
|
|
## Model Description |
|
|
|
**KnowRL-Skywork-OR1-7B-Preview** is a slow-thinking language model that results from applying our **KnowRL** framework to the base model `Skywork-OR1-7B-Preview`. |
|
|
|
The **KnowRL (Knowledgeable Reinforcement Learning)** framework is designed to mitigate hallucinations in Large Language Models (LLMs) by integrating external knowledge directly into the training process. This model undergoes a two-stage training process: |
|
|
|
1. **Cold-Start Supervised Fine-Tuning (SFT)**: The model first aligns with factual thinking patterns on a high-quality dataset. |
|
2. **Knowledgeable Reinforcement Learning (RL)**: The model is then further trained using a reward signal that explicitly encourages factual accuracy in its reasoning process, helping it learn its own knowledge boundaries. |
|
|
|
As a result, this model demonstrates a significant reduction in hallucinations on factual benchmarks while preserving or even enhancing the strong reasoning capabilities inherited from its base model. |
|
|
|
## How to Use |
|
|
|
### Using the `transformers` Library |
|
You can use this model with the `transformers` library for text generation tasks. It is important to follow the specific prompt format, which includes `<think>` and `<answer>` tags, to get the best results. |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
# Set the device |
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
|
# Load the model and tokenizer |
|
model_name = "zjunlp/KnowRL-Skywork-OR1-7B-Preview" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).to(device) |
|
|
|
# Define the prompt using the model's template |
|
prompt = "What is the main function of the mitochondria?" |
|
messages = [ |
|
{"role": "user", "content": prompt} |
|
] |
|
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
|
|
# Generate a response |
|
inputs = tokenizer(text, return_tensors="pt").to(device) |
|
outputs = model.generate(**inputs, max_new_tokens=512) |
|
|
|
# Decode and print the output |
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(response) |
|
``` |
|
|
|
### Using `huggingface-cli` |
|
You can also download the model from the command line using `huggingface-cli`. |
|
|
|
```bash |
|
huggingface-cli download zjunlp/KnowRL-Skywork-OR1-7B-Preview --local-dir KnowRL-Skywork-OR1-7B-Preview |
|
``` |
|
|
|
## Training Details |
|
|
|
The model's training process involves two distinct stages, using the data from the ` |
|
zjunlp/KnowRL-Train-Data` dataset. |
|
|
|
* **Stage 1: Cold-Start SFT**: The base model undergoes supervised fine-tuning on the `knowrl_coldstart.json` dataset. This stage helps the model adopt a fact-based, slow-thinking response structure. |
|
* **Stage 2: Knowledgeable RL**: The SFT-tuned model is further trained using reinforcement learning (GRPO). The reward function combines a correctness reward with a factuality reward, which is calculated by verifying the model's thinking process against an external knowledge base. This stage uses the `knowrl_RLdata.json` and `KnowRL_RLtrain_data_withknowledge.json` files. |
|
|
|
For complete details on the training configuration and hyperparameters, please refer to our [GitHub repository](https://github.com/zjunlp/KnowRL). |
|
|
|
--- |
|
|
|
## Citation |
|
If you find this model useful in your research, please consider citing our paper: |
|
```bibtex |
|
@article{ren2025knowrl, |
|
title={{KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality}}, |
|
author={Ren, Baochang and Qiao, Shuofei and Yu, Wenhao and Chen, Huajun and Zhang, Ningyu}, |
|
journal={arXiv preprint arXiv:2506.19807}, |
|
year={2025} |
|
} |
|
``` |
|
|