`<think>` and `</think>` missing
We directly utilized Qwen3's thinking chat template. You need to use jinja since it adds the think token. Otherwise you need to set reasoning format to qwen3 not none.
For lmstudio, you can try copying and pasting the chat template for Qwen3-30B-A3B and see if that works but I think that's an lmstudio issue
does this occur with the Q8 quant as well?
I'm using the Q4_K_M quant because that's the maximum my 32GB will fit. Not sure about Q8_0.
It looks like a user in r/LocalLLaMa has run into the same issue with the Q4_K_M quant.
I can confirm that on the llama.cpp server, when using --jinja --reasoning-format none
, the starting <think>
tag is missing (it works without --jinja
). I believe the issue lies in the Jinja chat template inside the GGUF file, particularly in the last lines.
...
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n<think>\n' }}
{%- endif %}
It seems to work when I change it to
...
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n' }}
{%- endif %}
@redeemer We directly utilized Qwen3's thinking chat template. You need to use jinja since it adds the think token. Otherwise you need to set reasoning format to qwen3 not none.
For lmstudio, you can try copying and pasting the chat template for Qwen3-30B-A3B and see if that works but I think that's an lmstudio issue
@danielhanchen
Iβm using --reasoning-format none
on purpose, to let my own app split the content into reasoning and non-reasoning by itself.
Providing the following Jinja template with the mentioned fix using --chat-template-file <file>
to the llama.cpp server seems to fix it:
{%- if tools %}
{{- '<|im_start|>system\n' }}
{%- if messages[0].role == 'system' %}
{{- messages[0].content + '\n\n' }}
{%- endif %}
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
{%- for tool in tools %}
{{- "\n" }}
{{- tool | tojson }}
{%- endfor %}
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
{%- else %}
{%- if messages[0].role == 'system' %}
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
{%- set index = (messages|length - 1) - loop.index0 %}
{%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
{%- set ns.multi_step_tool = false %}
{%- set ns.last_query_index = index %}
{%- endif %}
{%- endfor %}
{%- for message in messages %}
{%- if message.content is string %}
{%- set content = message.content %}
{%- else %}
{%- set content = '' %}
{%- endif %}
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
{%- elif message.role == "assistant" %}
{%- set reasoning_content = '' %}
{%- if message.reasoning_content is string %}
{%- set reasoning_content = message.reasoning_content %}
{%- else %}
{%- if '</think>' in content %}
{%- set reasoning_content = ((content.split('</think>')|first).rstrip('\n').split('<think>')|last).lstrip('\n') %}
{%- set content = (content.split('</think>')|last).lstrip('\n') %}
{%- endif %}
{%- endif %}
{%- if loop.index0 > ns.last_query_index %}
{%- if loop.last or (not loop.last and reasoning_content) %}
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + content }}
{%- endif %}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + content }}
{%- endif %}
{%- if message.tool_calls %}
{%- for tool_call in message.tool_calls %}
{%- if (loop.first and content) or (not loop.first) %}
{{- '\n' }}
{%- endif %}
{%- if tool_call.function %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{{- '<tool_call>\n{"name": "' }}
{{- tool_call.name }}
{{- '", "arguments": ' }}
{%- if tool_call.arguments is string %}
{{- tool_call.arguments }}
{%- else %}
{{- tool_call.arguments | tojson }}
{%- endif %}
{{- '}\n</tool_call>' }}
{%- endfor %}
{%- endif %}
{{- '<|im_end|>\n' }}
{%- elif message.role == "tool" %}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
{{- '<|im_start|>user' }}
{%- endif %}
{{- '\n<tool_response>\n' }}
{{- content }}
{{- '\n</tool_response>' }}
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
{{- '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n' }}
{%- endif %}
</think>
is missing for me on Q4_K_M
on the latest llama.cpp as well.
@redeemer We directly utilized Qwen3's thinking chat template. You need to use jinja since it adds the think token. Otherwise you need to set reasoning format to qwen3 not none.
For lmstudio, you can try copying and pasting the chat template for Qwen3-30B-A3B and see if that works but I think that's an lmstudio issue
By the way, there is no --reasoning-format qwen3
The only available choices are none or deepseek
I think it's important to remember what Qwen team said in the model card:
Additionally, to enforce model thinking, the default chat template automatically includes <think>. Therefore, it is normal for the model's output to contain only </think> without an explicit opening <think> tag.
So getting rid of the think tag (like in the modified script above) in the chat template fixes llamacpp's behavior, but it's only because the model automatically adds the think tag when it starts to respond. Llamacpp then recognizes it as a proper think tag.
We should follow the official advice and force the think tag in the template (which unsloth is already correctly doing in their template), and llamacpp should recognize it and treat the think tag inside the template the same as a think tag that was generated in the model's response.
So right now I'm thinking the correct fix would be on llamacpp side.
so some of you probably tried bartowski quant...any difference?
Let me investigate this a bit further today and I'll update you guys!
Just to clarify, missing opening <think>
tag is a problem only in specific cases when using llama.cpp server with --jinja --reasoning-format none
.
If you don't use --reasoning-format none
, it falls back to the default <think>
and </think>
tags (called deepseek in llama.cpp), and the llama.cpp server properly splits the reasoning content on its own - the server's API endpoint returns it under the "reasoning_content" key, while the "normal" content is returned as "content" key.
@YearZero
: Also I don't think it's fixable in llama.cpp because the whole point of --reasoning-format none
is to just pass everything as it is (as "normal" content) by the llama.cpp server - so llama.cpp doesn't know what the reasoning open/end tags are.
Just to clarify, missing opening
<think>
tag is a problem only in specific cases when using llama.cpp server with--jinja --reasoning-format none
.If you don't use
--reasoning-format none
, it falls back to the default<think>
and</think>
tags (called deepseek in llama.cpp), and the llama.cpp server properly splits the reasoning content on its own - the server's API endpoint returns it under the "reasoning_content" key, while the "normal" content is returned as "content" key.@YearZero : Also I don't think it's fixable in llama.cpp because the whole point of
--reasoning-format none
is to just pass everything as it is (as "normal" content) by the llama.cpp server - so llama.cpp doesn't know what the reasoning open/end tags are.
So when I use "--reasoning-format deepseek" (and also --jinja) it no longer outputs the thinking tokens to the screen in llama-server, but it hides them inside of 3 dots (...) without allowing me to expand/collapse to see the thinking tokens at all. Is that intended behavior? Is there a way to make llamacpp allow me to expand/collapse the reasoning parts in this case?
Would adding "--reasoning-format qwen" be a viable solution for llamacpp devs, since "none" and "deepseek" don't provide the desired behavior for these models?
@YearZero
I might be wrong, but I think --reasoning-format deepseek
(default) is correct for Qwen3 - they use the same <think>
and </think>
tokens. However, whether it is displayed depends on the inference app or script and if it properly handles the split between reasoning_content
and content
. I'm using my own custom inference app, which does not. I also just checked the built-in web UI of the llama.cpp server - and interestingly, it seems that it's not showing reasoning_content
either. I'm attaching a screenshot where you can see both reasoning_content
and content
being returned from the server (in the Chrome Web Developer Tools network tab), but reasoning_content
isn't showing up anywhere. The model is "thinking" internally - it's just not being displayed.
As an update, I reuploaded the model
@johnbean393
@netroy
@soulhacker
@redeemer
@urtuuuu
@johnbean393
- we verified that removing the <think>
is fine, since the model's probability of producing the think token seems to be nearly 100% anyways.
This should make llama.cpp / lmstudio inference work! Please redownload weights or as
@redeemer
mentioned, simply delete the <think>
token in the chat template ie change the below:
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n<think>\n' }}
{%- endif %}
to:
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n' }}
{%- endif %}
See https://huggingface.co/unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF?chat_template=default or https://huggingface.co/unsloth/Qwen3-30B-A3B-Thinking-2507/raw/main/chat_template.jinja
Thanks @danielhanchen for linking the template. That was so maddening I tried to get Qwen to diagnose her own template. But she died once she typed "<|im_end|>".
Can confirm updating Jina template in LM studio to the one linked above fixes the issue for thinking in LM studio.
Looking good for me too!
I couldn't handle that template any longer. If you want a denser thinking section, do the following:
Replace this line:
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
with this one:
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>' + content.lstrip('\n') }}