AI-ISL
/

DeepSeek-R1-Distill-Llama-8B-SP

Text Generation

chain-of-thought

large-language-model

text-generation-inference

Model card Files Files and versions Community

AIISL commited on May 26

Commit

8df4d47

·

verified ·

1 Parent(s): a017e5c

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -16,11 +16,11 @@ This model is the **SAFEPATH-aligned version of DeepSeek-R1-Distill-Qwen-7B**, f
 ## Model Description
-SAFEPATH applies a minimal alignment technique by inserting the phrase: Let's think about safety first at the beginning of the reasoning block. This encourages the model to engage in safer reasoning without reducing its reasoning performance.
-- 🔐 **Improved Safety**: Reduces harmful outputs (e.g., StrongReject, BeaverTails)
-- 🧠 **Preserved Reasoning**: Maintains or improves accuracy on MATH500, GPQA, and AIME24
-- ⚡ **Efficient**: Fine-tuned with only 100 steps
 ## Intended Use
@@ -35,7 +35,7 @@ The model has been evaluated on:
 - **Safety benchmarks**: StrongReject, BeaverTails
 - **Reasoning benchmarks**: MATH500, GPQA, AIME24
-For details, see our [paper](https://arxiv.org/abs/TODO).
 ## Overview Results
 <p align="left">

 ## Model Description
+SAFEPATH applies a minimal alignment technique by inserting the phrase: *Let's think about safety first* (Safety Primer) at the beginning of the reasoning block. This encourages the model to engage in safer reasoning without reducing its reasoning performance.
+- 🔐 **Improved Safety**: Reduces harmful outputs (e.g., StrongReject, BeaverTails) and is robust to jailbreak attacks
+- 🧠 **Preserved Reasoning**: Maintains accuracy on MATH500, GPQA, and AIME24
+- ⚡ **Efficiency**: Fine-tuned with only 100 steps
 ## Intended Use
 - **Safety benchmarks**: StrongReject, BeaverTails
 - **Reasoning benchmarks**: MATH500, GPQA, AIME24
+For details, see our [paper](https://arxiv.org/pdf/2505.14667).
 ## Overview Results
 <p align="left">