unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF · Re-uploaded GGUFs with removed <think> tokens for better outputs

Hey guys we saw some people having issues with using the model in tools other than llama.cpp. We re-uploaded the GGUFs and we verified that removing the <think> is fine, since the model's probability of producing the think token seems to be nearly 100% anyways.

This should make lmstudio, Ollama and other inference engines other llama.cpp work! Please redownload weights or as @redeemer mentioned, simply delete the <think> token in the chat template ie change the below:

{%- if add_generation_prompt %}
    {{- '<|im_start|>assistant\n<think>\n' }}
{%- endif %}

to:

{%- if add_generation_prompt %}
    {{- '<|im_start|>assistant\n' }}
{%- endif %}

See https://huggingface.co/unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF?chat_template=default or https://huggingface.co/unsloth/Qwen3-30B-A3B-Thinking-2507/raw/main/chat_template.jinja