New Chat Template + Tool Calling Fixes as of 05 Aug, 2025

#10
by shimmyshimmer - opened
Unsloth AI org
β€’
edited 4 days ago

Although we previously addressed tool calling issues, the fix only worked in certain setups, such as llama.cpp. With other configurations, tool functionality remained inconsistent.

This new update has undergone extensive testing, by us and others, and should significantly improve tool calling reliability and most solve any strange behaviors.

IMPORTANT:
You must update llama.cpp as they have also fixed some issues!

This issue affected all uploads of the model, regardless of the uploader. We did not introduce this problem or break the model in our quantizations - in fact, we’ve now fixed it. For correct chat template behavior and working tool calling, you must use our quants. Other quants (not uploaded by us) do not properly support tool calling.

shimmyshimmer pinned discussion
shimmyshimmer changed discussion title from New Chat Fixes + Tool Calling Fixes as of 05 Aug, 2025 to New Chat Template + Tool Calling Fixes as of 05 Aug, 2025

I am using ollama/ollama:rocm docker image, how can i apply this fix or how can it support tool calling ??
api returns as " does not support tools"

Can you share the template here or somewhere?

I don't want to redownload the gguf but a fix would be nice. I could then load the fixed template.

Thanks!

Can confirm that it behaves way better than before (using UD-Q4_K_XL). πŸ‘πŸ‘

{%- if tools %}
{{- '<|im_start|>system\n' }}
{%- if messages[0].role == 'system' %}
{{- messages[0].content + '\n\n' }}
{%- endif %}
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within XML tags:\n" }}
{%- for tool in tools %}
{{- "\n" }}
{{- tool | tojson }}
{%- endfor %}
{{- "\n\n\nFor each function call, return a json object with function name and arguments within XML tags:\n\n{"name": , "arguments": }\n<|im_end|>\n" }}
{%- else %}
{%- if messages[0].role == 'system' %}
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
{%- set index = (messages|length - 1) - loop.index0 %}
{%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('') and message.content.endswith('')) %}
{%- set ns.multi_step_tool = false %}
{%- set ns.last_query_index = index %}
{%- endif %}
{%- endfor %}
{%- for message in messages %}
{%- if message.content is string %}
{%- set content = message.content %}
{%- else %}
{%- set content = '' %}
{%- endif %}
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
{%- elif message.role == "assistant" %}
{%- set reasoning_content = '' %}
{%- if message.reasoning_content is string %}
{%- set reasoning_content = message.reasoning_content %}
{%- else %}
{%- if '' in content %}
{%- set reasoning_content = content.split('')[0].rstrip('\n').split('')[-1].lstrip('\n') %}
{%- set content = content.split('')[-1].lstrip('\n') %}
{%- endif %}
{%- endif %}
{%- if loop.index0 > ns.last_query_index %}
{%- if loop.last or (not loop.last and reasoning_content) %}
{{- '<|im_start|>' + message.role + '\n\n' + reasoning_content.strip('\n') + '\n\n\n' + content.lstrip('\n') }}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + content }}
{%- endif %}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + content }}
{%- endif %}
{%- if message.tool_calls %}
{%- for tool_call in message.tool_calls %}
{%- if (loop.first and content) or (not loop.first) %}
{{- '\n' }}
{%- endif %}
{%- if tool_call.function %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{{- '\n{"name": "' }}
{{- tool_call.name }}
{{- '", "arguments": ' }}
{%- if tool_call.arguments is string %}
{{- tool_call.arguments }}
{%- else %}
{{- tool_call.arguments | tojson }}
{%- endif %}
{{- '}\n' }}
{%- endfor %}
{%- endif %}
{{- '<|im_end|>\n' }}
{%- elif message.role == "tool" %}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
{{- '<|im_start|>user' }}
{%- endif %}
{{- '\n\n' }}
{{- content }}
{{- '\n' }}
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
{{- '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n' }}
{%- endif %}

Here is the template I have been using. I made it from a variety of sources, manually debugging, and with the help of AI models.

It seemed to work, but perhaps there are issues with it that are not obvious

One thing that I notice is that when using my template, cli tools like claude code and qwen code don't print to the terminal like:
<function=Read

● Read(qwen3_chat_template.jinja)
⎿  Read 135 lines (ctrl+r to expand)

They do with the template located in the unsloth hugginface non gguf model repo(which I assume is the same as the one in the new updated GGUF files)

I am using ollama/ollama:rocm docker image, how can i apply this fix or how can it support tool calling ??
api returns as " does not support tools"

Or ollama in general. I am experiencing silent toolcall failures, the AI just simply stops with no tool call.

This seems to somewhat work with qwen-code (with some oddity), but it fails with codex.

qwen-code output:

  Let me search for relevant code patterns.
  <tool_call>
  <function=search_file_content

I don't think those <> parts are supposed to be visible? The result then causes qwen-code to make a 1M+ tokens request which obviously fails. I don't know if this is because qwen-code is stupid or if it's a tool parsing bug.

With codex:

command running...
$ find . -name '*.py' -o -name '*.js' -o -name '*.rs' -o -name '*.cpp' -o -name '*.h' -o -name '*.hpp'

codex
I'll help you find where the URL is defined in the implementation. Let me explore the codebase to locate this information.
<tool_call>
<function=shell

codex

<parameter=command>
["find", "/workspace", "-type", "f", "-name", ".py", "-o", "-name", ".js", "-o", "-name", ".rs", "-o", "-name", ".cpp", "-o", "-name", ".h", "-o", "-name", ".hpp"]


</tool_call>

And nothing is actually called (hmm, or maybe it's not realizing this failed because "." became "/workspace").

It works with Claude Code (but it might have the same weird outputs as qwen-code).

This seems to somewhat work with qwen-code (with some oddity), but it fails with codex.

qwen-code output:

  Let me search for relevant code patterns.
  <tool_call>
  <function=search_file_content

I don't think those <> parts are supposed to be visible? The result then causes qwen-code to make a 1M+ tokens request which obviously fails. I don't know if this is because qwen-code is stupid or if it's a tool parsing bug.

With codex:

command running...
$ find . -name '*.py' -o -name '*.js' -o -name '*.rs' -o -name '*.cpp' -o -name '*.h' -o -name '*.hpp'

codex
I'll help you find where the URL is defined in the implementation. Let me explore the codebase to locate this information.
<tool_call>
<function=shell

codex

<parameter=command>
["find", "/workspace", "-type", "f", "-name", ".py", "-o", "-name", ".js", "-o", "-name", ".rs", "-o", "-name", ".cpp", "-o", "-name", ".h", "-o", "-name", ".hpp"]


</tool_call>

And nothing is actually called (hmm, or maybe it's not realizing this failed because "." became "/workspace").

It works with Claude Code (but it might have the same weird outputs as qwen-code).

I think the issue is that most tools expect the formatting to be json. Meanwhile this uses XML and has some extra tags that some tools do not expect. Some tools can handle it fine enough and just have some weird formatting. Others seem to break entirely.

If I am correct, a possible solution would be to have a proxy where you send the request to the model api like normal(llama.cpp server in my case) but then modify the return values to be in a format that the end user tooling expects.

I gave this a shot. Updated Llama.cpp and downloaded the fresh Q6 UD quant. I re-ran all my tests. It's still performing just as bad as before, unfortunately. Actually worse performance in RooCode and still doesn't do much of anything in Qwen Code.

I still don't understand the reasoning for why Qwen decided to make this one singular model the one that handles tools differently (because all the other models seem to work perfectly fine if I understand correctly). I mean if they were trying to push for a technological improvement, you'd at least expect it would work in their own product... Qwen Code, right? I don't get the logic here at all to make a change at not at least have it working in their own purpose-built coding solution. Just so perplexing of a decision.

To be extra sure... confirming that this is the correct template now?

{# Copyright 2025-present Unsloth. Apache 2.0 License. Unsloth Chat template fixes #}
{% macro render_item_list(item_list, tag_name='required') %}
    {%- if item_list is defined and item_list is iterable and item_list | length > 0 %}
        {%- if tag_name %}{{- '\n<' ~ tag_name ~ '>' -}}{% endif %}
            {{- '[' }}
                {%- for item in item_list -%}
                    {%- if loop.index > 1 %}{{- ", "}}{% endif -%}
                    {%- if item is string -%}
                        {{ "`" ~ item ~ "`" }}
                    {%- else -%}
                        {{ item }}
                    {%- endif -%}
                {%- endfor -%}
            {{- ']' }}
        {%- if tag_name %}{{- '</' ~ tag_name ~ '>' -}}{% endif %}
    {%- endif %}
{% endmacro %}

{%- if messages[0]["role"] == "system" %}
    {%- set system_message = messages[0]["content"] %}
    {%- set loop_messages = messages[1:] %}
{%- else %}
    {%- set loop_messages = messages %}
{%- endif %}

{%- if not tools is defined %}
    {%- set tools = [] %}
{%- endif %}

{%- if system_message is defined %}
    {{- "<|im_start|>system\n" + system_message }}
{%- else %}
    {%- if tools is iterable and tools | length > 0 %}
        {{- "<|im_start|>system\nYou are Qwen, a helpful AI assistant that can interact with a computer to solve tasks." }}
    {%- endif %}
{%- endif %}
{%- if tools is iterable and tools | length > 0 %}
    {{- "\n\nYou have access to the following functions:\n\n" }}
    {{- "<tools>" }}
    {%- for tool in tools %}
        {%- if tool.function is defined %}
            {%- set tool = tool.function %}
        {%- endif %}
        {{- "\n<function>\n<name>" ~ tool.name ~ "</name>" }}
        {{- '\n<description>' ~ (tool.description | trim) ~ '</description>' }}
        {{- '\n<parameters>' }}
        {%- for param_name, param_fields in tool.parameters.properties|items %}
...
...
...

(Truncated to save people from reading the whole thing)

If this is what it's supposed to be now, then, again, no noticeable improvement for me as of right now.

Regardless, much appreciated efforts from the Unsloth team. Sorry you guys had to go through all this craziness.

Hey, many thanks for all the hard work, just figured I'd drop my roo code setup so others can compare if they're having issues, since I was in the same boat as everyone before with tool calls failing a lot.

Got the latest beta LM Studio + the latest beta CUDA llama.cpp (1.45)

I was using LM Studio via OpenWebUI, but for some reason it wouldn't pickup the new GGUF when I put it in the old LM Studio folder, so step 1 was re-import the model in a totally different folder (/models/imported-models/Qwen3-Coder-30B-A3B-Instruct/Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL.gguf).

Re-entered all the recommended settings from the model card.

Configured Roo Code to use an Open AI compatible endpoint, pointed it at my OpenWebUI/api endpoint, picked Qwen3 coder.

Loaded up a fresh workspace in VS Code, and gave it a brief description of a weather API in rust (nothing crazy).

It did the whole thing, fixed a couple errors, wrote a readme, wrote integration tests, wrote a example script to call it, and had no tool-calling failures. Can't speak to anyone who has custom MCP servers or similar wired up.

I'll try it with the larger repo I was working with tomorrow and see if it's equally as stable, but it does look significantly better. I've been meaning to try qwen code, but I won't get a chance until next week.

@Sunderous I was quite skeptical of LM Studio giving me a different result because of how long I've been going at this, but I wanted to give you the benefit of the doubt. I went ahead and downloaded the latest Unsloth Q4 UD for Qwen3-Coder on it and it nailed both my tests first try!

It worked great in RooCode, no errors at all with tool calling.

(Edit: Initially wrote that it was also working great in Qwen Code, and realized I accidentally loaded the wrong model. After confirming I was loading the right model, I got the errors mentioned below.)

The only issues I'm getting:

1. Occasionally after a prompt has finished the model becomes unloaded and throws an error, and I just have to click the "retry" button in RooCode to get it going.

Update: Fixed it by switching from "LM Studio" to "OpenAI Compatible" in RooCode.

  1. Not working in Qwen Code for some reason. Maybe, despite the fact I asked LM Studio to download the Unsloth quant, it is still using a different template?

image.png

But beyond that it is at least working in RooCode.

So... I now have an important question: Why is LM Studio working and Llama.cpp not working with the same exact model in RooCode?

I didn't think the "engine" you used to run a model had any impact on the output quality (speed, of course). So this is quite a shock to me personally.

What is LM Studio doing that's not happening with Llama.cpp? Is LM Studio somehow translating the tool interactions to make them compatible maybe? Is it a different template? Really curious now.

Edit: I figured out how to see what the template is. It's definitely showing the same chat template in LM Studio, so it isn't that.

image.png

Just adding my experience here; downloaded latest Q8_K_XL quant (10:39 PM PST Aug 5, also built latest main-branch llama.cpp around this time) and ran with:

llama-server -m models/Qwen3-Coder-30B-A3B-Instruct-UD-Q8_K_XL.gguf
      --jinja
      --host 0.0.0.0
      --port 8181
      -ngl 99
      -c 32768
      -b 10240
      -ub 2048
      --n-cpu-moe 10
      -fa
      -t 24

Here's what I'm getting in Qwen Code (it doesn't work):

image.png

I have tried Qwen3-Coder-30B-A3B-Instruct-UD-Q3_K_XL, Qwen3-Coder-30B-A3B-Instruct-Q3_K_XL with roocode,crush,qwen,opencode none of them works.
Updated ollama docker container to latest.

Both above models shows same hash
Screenshot 2025-08-06 at 10.59.37β€―AM.png

Opencode shows like

{"name": "read", "arguments": {"filePath": "/workspace/CRUSH.md"}}

Crush shows like

Screenshot 2025-08-06 at 11.20.12β€―AM.png

RooCode shows like

[ERROR] You did not use a tool in your previous response! Please retry with a tool use.

Reminder: Instructions for Tool Use

Tool uses are formatted using XML-style tags. The tool name itself becomes the XML tag name. Each parameter is enclosed within its own set of tags. Here's the structure:

value1 value2 ...

For example, to use the attempt_completion tool:

I have completed the task...

Always use the actual tool name as the XML tag name for proper parsing and execution.

Next Steps

If you have completed the user's task, use the attempt_completion tool.
If you require additional information from the user, use the ask_followup_question tool.
Otherwise, if you have not completed the task and do not need additional information, then proceed with the next step of the task.
(This is an automated message, so do not respond to it conversationally.)

# VSCode Visible Files

VSCode Open Tabs

Current Time

Current time in ISO 8601 UTC format: 2025-08-06T05:41:01.874Z
User time zone: Asia/Calcutta, UTC+5:30

Current Cost

$0.00

Current Mode

architect
πŸ—οΈ Architect
hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:UD-Q3_K_XL
You have not created a todo list yet. Create one with update_todo_list if your task is complicated or involves multiple steps.

Qwen shows Like

Started to loop

Screenshot 2025-08-06 at 11.16.06β€―AM.png

Got the latest beta LM Studio + the latest beta CUDA llama.cpp (1.45)

I was using LM Studio via OpenWebUI, but for some reason it wouldn't pickup the new GGUF when I put it in the old LM Studio folder, so step 1 was re-import the model in a totally different folder (/models/imported-models/Qwen3-Coder-30B-A3B-Instruct/Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL.gguf).

The following worked for me (I was previously having issues with multiple agents, Qwen Code, Roo Code, Kilo Code, etc.):

  • Switching LM Studio from latest stable to latest beta
  • I used the model above (was already downloaded my LM Studio): qwen3-coder-30b-a3b-instruct@q4_k_xl
  • I made sure to change the model settings: prompt tab, prompt template, and I put the content from this link.
  • This DID have issues with Qwen Code, which were solved after I asked the LLM what would be wrong with the Jinja template. It told me to remove | safe at two localtions, which I did. Since then, everything seems to work correctly.

Does openwebui have this feature to change chat template ?

Got the latest beta LM Studio + the latest beta CUDA llama.cpp (1.45)

I was using LM Studio via OpenWebUI, but for some reason it wouldn't pickup the new GGUF when I put it in the old LM Studio folder, so step 1 was re-import the model in a totally different folder (/models/imported-models/Qwen3-Coder-30B-A3B-Instruct/Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL.gguf).

The following worked for me (I was previously having issues with multiple agents, Qwen Code, Roo Code, Kilo Code, etc.):

  • Switching LM Studio from latest stable to latest beta
  • I used the model above (was already downloaded my LM Studio): qwen3-coder-30b-a3b-instruct@q4_k_xl
  • I made sure to change the model settings: prompt tab, prompt template, and I put the content from this link.
  • This DID have issues with Qwen Code, which were solved after I asked the LLM what would be wrong with the Jinja template. It told me to remove | safe at two localtions, which I did. Since then, everything seems to work correctly.

@belgaied2 Can you share the working jinja chat template for Qwen Code?

Where is safe in current chat template? I don't see it?

Got the latest beta LM Studio + the latest beta CUDA llama.cpp (1.45)

I was using LM Studio via OpenWebUI, but for some reason it wouldn't pickup the new GGUF when I put it in the old LM Studio folder, so step 1 was re-import the model in a totally different folder (/models/imported-models/Qwen3-Coder-30B-A3B-Instruct/Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL.gguf).

The following worked for me (I was previously having issues with multiple agents, Qwen Code, Roo Code, Kilo Code, etc.):

  • Switching LM Studio from latest stable to latest beta
  • I used the model above (was already downloaded my LM Studio): qwen3-coder-30b-a3b-instruct@q4_k_xl
  • I made sure to change the model settings: prompt tab, prompt template, and I put the content from this link.
  • This DID have issues with Qwen Code, which were solved after I asked the LLM what would be wrong with the Jinja template. It told me to remove | safe at two localtions, which I did. Since then, everything seems to work correctly.

After updated the chat_template, it still not working properly in Roo Code.
This is from Roo code:
Roo is having trouble...
Roo appears to be stuck in a loop, attempting the same action (read_file) repeatedly. This might indicate a problem with its current strategy. Consider rephrasing the task, providing more specific instructions, or guiding it towards a different approach.

Qwen updated their chat_template in tokenizer_config.json
https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct/discussions/14#689450d5592b0f16e20c183a

maybe this works better

@shimmyshimmer it says updated few hours ago? Have the ggufs been updated?

@shimmyshimmer it says updated few hours ago? Have the ggufs been updated?

No they haven't, check the dates on the files itself. The last commit just removed the imatrix file.

I'm still encountering issues with tool calling. I'm using Cline and Roo Code with llama.cpp server using OpenAI compatible API.
Updated to latest llama.cpp, Cline/Roo, latest ggufs. I also tried the templates above and they didn't work for me. I haven't tried LMStudio yet though.

Sign up or log in to comment