AI-ISL
/

DeepSeek-R1-Distill-Qwen-7B-SP

Text Generation

chain-of-thought

large-language-model

text-generation-inference

Model card Files Files and versions Community

DeepSeek-R1-Distill-Qwen-7B-SP / README.md

AIISL's picture

Update README.md

7ada50e verified 3 months ago

|

history blame contribute delete

1.27 kB

	---
	license: apache-2.0
	tags:
	- chain-of-thought
	- safety
	- alignment
	- reasoning
	- large-language-model
	library_name: transformers
	inference: true
	---

	# SAFEPATH-R-7B

	This model is the SAFEPATH-aligned version of DeepSeek-R1-Distill-Qwen-7B, fine-tuned using prefix-only safety priming.

	## Model Description

	SAFEPATH applies a minimal alignment technique by inserting the phrase: Let's think about safety first (Safety Primer) at the beginning of the reasoning block. This encourages the model to engage in safer reasoning without reducing its reasoning performance.

	- 🔐 Improved Safety: Reduces harmful outputs (e.g., StrongReject, BeaverTails) and is robust to jailbreak attacks
	- 🧠 Preserved Reasoning: Maintains accuracy on MATH500, GPQA, and AIME24
	- ⚡ Efficiency: Fine-tuned with only 100 steps

	## Intended Use

	This model is intended for research in:
	- Safety alignment in Large Reasoning Models (LRMs)
	- Robust reasoning under adversarial settings
	- Chain-of-thought alignment studies

	For details, see our [paper](https://arxiv.org/pdf/2505.14667).

	## Overview Results
	<p align="left">
	<img src="https://github.com/AI-ISL/AI-ISL.github.io/blob/main/static/images/safepath/main_results.png?raw=true" width="800"/>
	</p>