Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
The reward model can be used for iterative SFT/DPO
|
2 |
|
3 |
```
|
|
|
1 |
+
Ideally, please use the following format for the reward evaluations,
|
2 |
+
```python
|
3 |
+
def format(messages):
|
4 |
+
format_text = "[INST] You must read the following conversation carefully and rate the assistant's response from score 0-100 in these aspects: helpfulness, correctness, coherence, honesty, complexity.\n"
|
5 |
+
|
6 |
+
for message in messages:
|
7 |
+
if message['role'] == "user":
|
8 |
+
format_text = format_text + "\nUser: " + message['content']
|
9 |
+
elif message['role'] == 'assistant':
|
10 |
+
format_text = format_text + "\nAssistant: " + message['content']
|
11 |
+
return format_text
|
12 |
+
```
|
13 |
+
|
14 |
+
|
15 |
The reward model can be used for iterative SFT/DPO
|
16 |
|
17 |
```
|