AI-ISL
/

DeepSeek-R1-Distill-Qwen-7B-SP

Text Generation

chain-of-thought

large-language-model

text-generation-inference

Model card Files Files and versions Community

AIISL commited on May 26

Commit

d269a5b

·

verified ·

1 Parent(s): 88a170d

Update README.md

Files changed (1) hide show

README.md +43 -3

README.md CHANGED Viewed

@@ -1,3 +1,43 @@
----
-license: mit
----

+---
+license: apache-2.0
+tags:
+  - chain-of-thought
+  - safety
+  - alignment
+  - reasoning
+  - large-language-model
+library_name: transformers
+inference: true
+---
+# SAFEPATH-R-7B
+This model is the **SAFEPATH-aligned version of DeepSeek-R1-Distill-Qwen-7B**, fine-tuned using prefix-only safety priming.
+## Model Description
+SAFEPATH applies a minimal alignment technique by inserting the phrase: *Let's think about safety first* (Safety Primer) at the beginning of the reasoning block. This encourages the model to engage in safer reasoning without reducing its reasoning performance.
+- 🔐 **Improved Safety**: Reduces harmful outputs (e.g., StrongReject, BeaverTails) and is robust to jailbreak attacks
+- 🧠 **Preserved Reasoning**: Maintains accuracy on MATH500, GPQA, and AIME24
+- ⚡ **Efficiency**: Fine-tuned with only 100 steps
+## Intended Use
+This model is intended for research in:
+- Safety alignment in Large Reasoning Models (LRMs)
+- Robust reasoning under adversarial settings
+- Chain-of-thought alignment studies
+## Evaluation
+The model has been evaluated on:
+- **Safety benchmarks**: StrongReject, BeaverTails
+- **Reasoning benchmarks**: MATH500, GPQA, AIME24
+For details, see our [paper](https://arxiv.org/pdf/2505.14667).
+## Overview Results
+<p align="left">
+  <img src="https://github.com/AI-ISL/AI-ISL.github.io/blob/main/static/images/safepath/main_results.png?raw=true" width="800"/>
+</p>