--- license: apache-2.0 tags: - chain-of-thought - safety - alignment - reasoning - large-language-model library_name: transformers inference: true --- # SAFEPATH-R-7B This model is the **SAFEPATH-aligned version of DeepSeek-R1-Distill-Qwen-7B**, fine-tuned using prefix-only safety priming. ## Model Description SAFEPATH applies a minimal alignment technique by inserting the phrase: *Let's think about safety first* (Safety Primer) at the beginning of the reasoning block. This encourages the model to engage in safer reasoning without reducing its reasoning performance. - 🔐 **Improved Safety**: Reduces harmful outputs (e.g., StrongReject, BeaverTails) and is robust to jailbreak attacks - 🧠 **Preserved Reasoning**: Maintains accuracy on MATH500, GPQA, and AIME24 - ⚡ **Efficiency**: Fine-tuned with only 100 steps ## Intended Use This model is intended for research in: - Safety alignment in Large Reasoning Models (LRMs) - Robust reasoning under adversarial settings - Chain-of-thought alignment studies For details, see our [paper](https://arxiv.org/pdf/2505.14667). ## Overview Results