Re-uploaded GGUFs with removed <think> tokens for better outputs
#4
pinned
by
danielhanchen
- opened
Hey guys we saw some people having issues with using the model in tools other than llama.cpp. We re-uploaded the GGUFs and we verified that removing the <think>
is fine, since the model's probability of producing the think token seems to be nearly 100% anyways.
This should make lmstudio, Ollama and other inference engines other llama.cpp work! Please redownload weights or as
@redeemer
mentioned, simply delete the <think>
token in the chat template ie change the below:
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n<think>\n' }}
{%- endif %}
to:
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n' }}
{%- endif %}
See https://huggingface.co/unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF?chat_template=default or https://huggingface.co/unsloth/Qwen3-30B-A3B-Thinking-2507/raw/main/chat_template.jinja
danielhanchen
pinned discussion