This model is the SAFEPATH-aligned version of DeepSeek-R1-Distill-Qwen-7B, fine-tuned using prefix-only safety priming.
Model Description
SAFEPATH applies a minimal alignment technique by inserting the phrase: Let's think about safety first (Safety Primer) at the beginning of the reasoning block. This encourages the model to engage in safer reasoning without reducing its reasoning performance.
🔐 Improved Safety: Reduces harmful outputs (e.g., StrongReject, BeaverTails) and is robust to jailbreak attacks
🧠 Preserved Reasoning: Maintains accuracy on MATH500, GPQA, and AIME24