hendrydong
commited on
Commit
•
d3a830f
1
Parent(s):
89579a9
Update README.md
Browse files
README.md
CHANGED
@@ -4,6 +4,8 @@ license: cc-by-nc-4.0
|
|
4 |
|
5 |
This reward function can be used for RLHF, including PPO, iterative SFT, iterative DPO.
|
6 |
|
|
|
|
|
7 |
## Training
|
8 |
The base model is `meta-llama/Meta-Llama-3-8B-Instruct`.
|
9 |
|
|
|
4 |
|
5 |
This reward function can be used for RLHF, including PPO, iterative SFT, iterative DPO.
|
6 |
|
7 |
+
The license is derived from `PKU-Alignment/PKU-SafeRLHF-30K`.
|
8 |
+
|
9 |
## Training
|
10 |
The base model is `meta-llama/Meta-Llama-3-8B-Instruct`.
|
11 |
|