Qwen
/

QwQ-32B-GGUF

@@ -88,8 +88,9 @@ To achieve optimal performance, we recommend the following settings:
 1. **Enforce Thoughtful Output**: Ensure the model starts with "\<think\>\n" to prevent generating empty thinking content, which can degrade output quality.
 2. **Sampling Parameters**:
-   - Use Temperature=0.6 and TopP=0.95 instead of Greedy decoding to avoid endless repetitions.
    - Use TopK between 20 and 40 to filter out rare token occurrences while maintaining the diversity of the generated output.
 3. **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content. This feature is already implemented in `apply_chat_template`.

 1. **Enforce Thoughtful Output**: Ensure the model starts with "\<think\>\n" to prevent generating empty thinking content, which can degrade output quality.
 2. **Sampling Parameters**:
+   - Use Temperature=0.6, TopP=0.95, MinP=0 instead of Greedy decoding to avoid endless repetitions.
    - Use TopK between 20 and 40 to filter out rare token occurrences while maintaining the diversity of the generated output.
+   - For supported frameworks, you can adjust the `presence_penalty` parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may result in occasional language mixing and a slight decrease in performance.
 3. **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content. This feature is already implemented in `apply_chat_template`.