KnowRL-Skywork-OR1-7B-Preview / README.md

Update README.md

9495b39 verified 1 day ago

4.05 kB

	---
	license: mit
	---
	<div align="center">
	<h1 align="center"> KnowRL </h1>
	<h3 align="center"> Exploring Knowledgeable Reinforcement Learning for Factuality </h3>

	<p align="center">
	<a href="https://arxiv.org/abs/2506.19807">📄arXiv</a> •
	<a href="https://github.com/zjunlp/KnowRL">💻GitHub Repo</a> •
	<a href="https://huggingface.co/datasets/zjunlp/KnowRL-Train-Data">📖Dataset</a>
	</p>

	</div>

	---

	## Model Description

	KnowRL-Skywork-OR1-7B-Preview is a slow-thinking language model that results from applying our KnowRL framework to the base model `Skywork-OR1-7B-Preview`.

	The KnowRL (Knowledgeable Reinforcement Learning) framework is designed to mitigate hallucinations in Large Language Models (LLMs) by integrating external knowledge directly into the training process. This model undergoes a two-stage training process:

	1. Cold-Start Supervised Fine-Tuning (SFT): The model first aligns with factual thinking patterns on a high-quality dataset.
	2. Knowledgeable Reinforcement Learning (RL): The model is then further trained using a reward signal that explicitly encourages factual accuracy in its reasoning process, helping it learn its own knowledge boundaries.

	As a result, this model demonstrates a significant reduction in hallucinations on factual benchmarks while preserving or even enhancing the strong reasoning capabilities inherited from its base model.

	## How to Use

	### Using the `transformers` Library
	You can use this model with the `transformers` library for text generation tasks. It is important to follow the specific prompt format, which includes `<think>` and `<answer>` tags, to get the best results.

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	# Set the device
	device = "cuda" if torch.cuda.is_available() else "cpu"

	# Load the model and tokenizer
	model_name = "zjunlp/KnowRL-Skywork-OR1-7B-Preview"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).to(device)

	# Define the prompt using the model's template
	prompt = "What is the main function of the mitochondria?"
	messages = [
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

	# Generate a response
	inputs = tokenizer(text, return_tensors="pt").to(device)
	outputs = model.generate(**inputs, max_new_tokens=512)

	# Decode and print the output
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```

	### Using `huggingface-cli`
	You can also download the model from the command line using `huggingface-cli`.

	```bash
	huggingface-cli download zjunlp/KnowRL-Skywork-OR1-7B-Preview --local-dir KnowRL-Skywork-OR1-7B-Preview
	```

	## Training Details

	The model's training process involves two distinct stages, using the data from the `
	zjunlp/KnowRL-Train-Data` dataset.

	* Stage 1: Cold-Start SFT: The base model undergoes supervised fine-tuning on the `knowrl_coldstart.json` dataset. This stage helps the model adopt a fact-based, slow-thinking response structure.
	* Stage 2: Knowledgeable RL: The SFT-tuned model is further trained using reinforcement learning (GRPO). The reward function combines a correctness reward with a factuality reward, which is calculated by verifying the model's thinking process against an external knowledge base. This stage uses the `knowrl_RLdata.json` and `KnowRL_RLtrain_data_withknowledge.json` files.

	For complete details on the training configuration and hyperparameters, please refer to our [GitHub repository](https://github.com/zjunlp/KnowRL).

	---

	## Citation
	If you find this model useful in your research, please consider citing our paper:
	```bibtex
	@article{ren2025knowrl,
	title={{KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality}},
	author={Ren, Baochang and Qiao, Shuofei and Yu, Wenhao and Chen, Huajun and Zhang, Ningyu},
	journal={arXiv preprint arXiv:2506.19807},
	year={2025}
	}
	```