feihu.hf
commited on
Commit
·
b088fbb
1
Parent(s):
bd97555
update README
Browse files
README.md
CHANGED
@@ -88,8 +88,9 @@ To achieve optimal performance, we recommend the following settings:
|
|
88 |
1. **Enforce Thoughtful Output**: Ensure the model starts with "\<think\>\n" to prevent generating empty thinking content, which can degrade output quality.
|
89 |
|
90 |
2. **Sampling Parameters**:
|
91 |
-
- Use Temperature=0.6
|
92 |
- Use TopK between 20 and 40 to filter out rare token occurrences while maintaining the diversity of the generated output.
|
|
|
93 |
|
94 |
3. **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content. This feature is already implemented in `apply_chat_template`.
|
95 |
|
|
|
88 |
1. **Enforce Thoughtful Output**: Ensure the model starts with "\<think\>\n" to prevent generating empty thinking content, which can degrade output quality.
|
89 |
|
90 |
2. **Sampling Parameters**:
|
91 |
+
- Use Temperature=0.6, TopP=0.95, MinP=0 instead of Greedy decoding to avoid endless repetitions.
|
92 |
- Use TopK between 20 and 40 to filter out rare token occurrences while maintaining the diversity of the generated output.
|
93 |
+
- For supported frameworks, you can adjust the `presence_penalty` parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may result in occasional language mixing and a slight decrease in performance.
|
94 |
|
95 |
3. **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content. This feature is already implemented in `apply_chat_template`.
|
96 |
|