metadata

license: mit

KnowRL

Exploring Knowledgeable Reinforcement Learning for Factuality

📄arXiv • 💻GitHub Repo • 📖Dataset

Model Description

KnowRL-Skywork-OR1-7B-Preview is a slow-thinking language model that results from applying our KnowRL framework to the base model Skywork-OR1-7B-Preview.

The KnowRL (Knowledgeable Reinforcement Learning) framework is designed to mitigate hallucinations in Large Language Models (LLMs) by integrating external knowledge directly into the training process. This model undergoes a two-stage training process:

Cold-Start Supervised Fine-Tuning (SFT): The model first aligns with factual thinking patterns on a high-quality dataset.
Knowledgeable Reinforcement Learning (RL): The model is then further trained using a reward signal that explicitly encourages factual accuracy in its reasoning process, helping it learn its own knowledge boundaries.

As a result, this model demonstrates a significant reduction in hallucinations on factual benchmarks while preserving or even enhancing the strong reasoning capabilities inherited from its base model.

How to Use

Using the `transformers` Library

You can use this model with the transformers library for text generation tasks. It is important to follow the specific prompt format, which includes <think> and <answer> tags, to get the best results.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Set the device
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load the model and tokenizer
model_name = "zjunlp/KnowRL-Skywork-OR1-7B-Preview"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).to(device)

# Define the prompt using the model's template
prompt = "What is the main function of the mitochondria?"
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Generate a response
inputs = tokenizer(text, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=512)

# Decode and print the output
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Using `huggingface-cli`

You can also download the model from the command line using huggingface-cli.

huggingface-cli download zjunlp/KnowRL-Skywork-OR1-7B-Preview --local-dir KnowRL-Skywork-OR1-7B-Preview

Training Details

The model's training process involves two distinct stages, using the data from the zjunlp/KnowRL-Train-Data dataset.

Stage 1: Cold-Start SFT: The base model undergoes supervised fine-tuning on the knowrl_coldstart.json dataset. This stage helps the model adopt a fact-based, slow-thinking response structure.
Stage 2: Knowledgeable RL: The SFT-tuned model is further trained using reinforcement learning (GRPO). The reward function combines a correctness reward with a factuality reward, which is calculated by verifying the model's thinking process against an external knowledge base. This stage uses the knowrl_RLdata.json and KnowRL_RLtrain_data_withknowledge.json files.

For complete details on the training configuration and hyperparameters, please refer to our GitHub repository.

Citation

If you find this model useful in your research, please consider citing our paper:

@article{ren2025knowrl,
  title={{KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality}}, 
  author={Ren, Baochang and Qiao, Shuofei and Yu, Wenhao and Chen, Huajun and Zhang, Ningyu},
  journal={arXiv preprint arXiv:2506.19807},
  year={2025}
}