Re-uploaded GGUFs with removed <think> tokens for better outputs

#4
by danielhanchen - opened
Unsloth AI org
β€’
edited 5 days ago

Hey guys we saw some people having issues with using the model in tools other than llama.cpp. We re-uploaded the GGUFs and we verified that removing the <think> is fine, since the model's probability of producing the think token seems to be nearly 100% anyways.

This should make lmstudio, Ollama and other inference engines other llama.cpp work! Please redownload weights or as @redeemer mentioned, simply delete the <think> token in the chat template ie change the below:

{%- if add_generation_prompt %}
    {{- '<|im_start|>assistant\n<think>\n' }}
{%- endif %}

to:

{%- if add_generation_prompt %}
    {{- '<|im_start|>assistant\n' }}
{%- endif %}

See https://huggingface.co/unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF?chat_template=default or https://huggingface.co/unsloth/Qwen3-30B-A3B-Thinking-2507/raw/main/chat_template.jinja

danielhanchen pinned discussion

Sign up or log in to comment