Update README.md
Browse files
README.md
CHANGED
@@ -16,11 +16,11 @@ This model is the **SAFEPATH-aligned version of DeepSeek-R1-Distill-Qwen-7B**, f
|
|
16 |
|
17 |
## Model Description
|
18 |
|
19 |
-
SAFEPATH applies a minimal alignment technique by inserting the phrase: Let's think about safety first at the beginning of the reasoning block. This encourages the model to engage in safer reasoning without reducing its reasoning performance.
|
20 |
|
21 |
-
- 🔐 **Improved Safety**: Reduces harmful outputs (e.g., StrongReject, BeaverTails)
|
22 |
-
- 🧠 **Preserved Reasoning**: Maintains
|
23 |
-
- ⚡ **
|
24 |
|
25 |
## Intended Use
|
26 |
|
@@ -35,7 +35,7 @@ The model has been evaluated on:
|
|
35 |
- **Safety benchmarks**: StrongReject, BeaverTails
|
36 |
- **Reasoning benchmarks**: MATH500, GPQA, AIME24
|
37 |
|
38 |
-
For details, see our [paper](https://arxiv.org/
|
39 |
|
40 |
## Overview Results
|
41 |
<p align="left">
|
|
|
16 |
|
17 |
## Model Description
|
18 |
|
19 |
+
SAFEPATH applies a minimal alignment technique by inserting the phrase: *Let's think about safety first* (Safety Primer) at the beginning of the reasoning block. This encourages the model to engage in safer reasoning without reducing its reasoning performance.
|
20 |
|
21 |
+
- 🔐 **Improved Safety**: Reduces harmful outputs (e.g., StrongReject, BeaverTails) and is robust to jailbreak attacks
|
22 |
+
- 🧠 **Preserved Reasoning**: Maintains accuracy on MATH500, GPQA, and AIME24
|
23 |
+
- ⚡ **Efficiency**: Fine-tuned with only 100 steps
|
24 |
|
25 |
## Intended Use
|
26 |
|
|
|
35 |
- **Safety benchmarks**: StrongReject, BeaverTails
|
36 |
- **Reasoning benchmarks**: MATH500, GPQA, AIME24
|
37 |
|
38 |
+
For details, see our [paper](https://arxiv.org/pdf/2505.14667).
|
39 |
|
40 |
## Overview Results
|
41 |
<p align="left">
|