Qwen
/

Text Generation
GGUF
English
chat
conversational
feihu.hf commited on
Commit
b088fbb
·
1 Parent(s): bd97555

update README

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -88,8 +88,9 @@ To achieve optimal performance, we recommend the following settings:
88
  1. **Enforce Thoughtful Output**: Ensure the model starts with "\<think\>\n" to prevent generating empty thinking content, which can degrade output quality.
89
 
90
  2. **Sampling Parameters**:
91
- - Use Temperature=0.6 and TopP=0.95 instead of Greedy decoding to avoid endless repetitions.
92
  - Use TopK between 20 and 40 to filter out rare token occurrences while maintaining the diversity of the generated output.
 
93
 
94
  3. **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content. This feature is already implemented in `apply_chat_template`.
95
 
 
88
  1. **Enforce Thoughtful Output**: Ensure the model starts with "\<think\>\n" to prevent generating empty thinking content, which can degrade output quality.
89
 
90
  2. **Sampling Parameters**:
91
+ - Use Temperature=0.6, TopP=0.95, MinP=0 instead of Greedy decoding to avoid endless repetitions.
92
  - Use TopK between 20 and 40 to filter out rare token occurrences while maintaining the diversity of the generated output.
93
+ - For supported frameworks, you can adjust the `presence_penalty` parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may result in occasional language mixing and a slight decrease in performance.
94
 
95
  3. **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content. This feature is already implemented in `apply_chat_template`.
96