AIISL commited on
Commit
8df4d47
·
verified ·
1 Parent(s): a017e5c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -16,11 +16,11 @@ This model is the **SAFEPATH-aligned version of DeepSeek-R1-Distill-Qwen-7B**, f
16
 
17
  ## Model Description
18
 
19
- SAFEPATH applies a minimal alignment technique by inserting the phrase: Let's think about safety first at the beginning of the reasoning block. This encourages the model to engage in safer reasoning without reducing its reasoning performance.
20
 
21
- - 🔐 **Improved Safety**: Reduces harmful outputs (e.g., StrongReject, BeaverTails)
22
- - 🧠 **Preserved Reasoning**: Maintains or improves accuracy on MATH500, GPQA, and AIME24
23
- - ⚡ **Efficient**: Fine-tuned with only 100 steps
24
 
25
  ## Intended Use
26
 
@@ -35,7 +35,7 @@ The model has been evaluated on:
35
  - **Safety benchmarks**: StrongReject, BeaverTails
36
  - **Reasoning benchmarks**: MATH500, GPQA, AIME24
37
 
38
- For details, see our [paper](https://arxiv.org/abs/TODO).
39
 
40
  ## Overview Results
41
  <p align="left">
 
16
 
17
  ## Model Description
18
 
19
+ SAFEPATH applies a minimal alignment technique by inserting the phrase: *Let's think about safety first* (Safety Primer) at the beginning of the reasoning block. This encourages the model to engage in safer reasoning without reducing its reasoning performance.
20
 
21
+ - 🔐 **Improved Safety**: Reduces harmful outputs (e.g., StrongReject, BeaverTails) and is robust to jailbreak attacks
22
+ - 🧠 **Preserved Reasoning**: Maintains accuracy on MATH500, GPQA, and AIME24
23
+ - ⚡ **Efficiency**: Fine-tuned with only 100 steps
24
 
25
  ## Intended Use
26
 
 
35
  - **Safety benchmarks**: StrongReject, BeaverTails
36
  - **Reasoning benchmarks**: MATH500, GPQA, AIME24
37
 
38
+ For details, see our [paper](https://arxiv.org/pdf/2505.14667).
39
 
40
  ## Overview Results
41
  <p align="left">