Update README.md
Browse files
README.md
CHANGED
@@ -10,9 +10,9 @@ library_name: transformers
|
|
10 |
inference: true
|
11 |
---
|
12 |
|
13 |
-
# SAFEPATH-R-
|
14 |
|
15 |
-
This model is the **SAFEPATH-aligned version of DeepSeek-R1-Distill-
|
16 |
|
17 |
## Model Description
|
18 |
|
@@ -20,7 +20,7 @@ SAFEPATH applies a minimal alignment technique by inserting the phrase: *Let's t
|
|
20 |
|
21 |
- 🔐 **Improved Safety**: Reduces harmful outputs (e.g., StrongReject, BeaverTails) and is robust to jailbreak attacks
|
22 |
- 🧠 **Preserved Reasoning**: Maintains accuracy on MATH500, GPQA, and AIME24
|
23 |
-
- ⚡ **Efficiency**: Fine-tuned with only
|
24 |
|
25 |
## Intended Use
|
26 |
|
|
|
10 |
inference: true
|
11 |
---
|
12 |
|
13 |
+
# SAFEPATH-R-8B
|
14 |
|
15 |
+
This model is the **SAFEPATH-aligned version of DeepSeek-R1-Distill-Llama-8B**, fine-tuned using prefix-only safety priming.
|
16 |
|
17 |
## Model Description
|
18 |
|
|
|
20 |
|
21 |
- 🔐 **Improved Safety**: Reduces harmful outputs (e.g., StrongReject, BeaverTails) and is robust to jailbreak attacks
|
22 |
- 🧠 **Preserved Reasoning**: Maintains accuracy on MATH500, GPQA, and AIME24
|
23 |
+
- ⚡ **Efficiency**: Fine-tuned with only 20 steps
|
24 |
|
25 |
## Intended Use
|
26 |
|