RooCode Performance Is Very Poor; Thinking & Non Thinking Performing Way Better
I've done a lot of testing with Q3 - Q6 UD quants for Coder and it's all been really disappointing in RooCode. So many errors and so much endless looping. I also downloaded the static Q4_K_M quant, and that was worst of all: it didn't even write code in the editor, just wrote it in the chatbox instead and didn't call any tools whatsoever and claimed its work was finished.
It's really incredibly strange because the Thinking & Non Thinking models released this past week are genuinely performing better in every aspect compared to Coder (at least in terms of working along side RooCode).
My settings (RTX 4090, Linux, llama.cpp):
"Qwen3-Coder-30B-A3B-Instruct-UD-Q6KXL":
cmd: |
llama-server
-m /mnt/big/AI/models/llamacpp/Qwen3-Coder-30B-A3B-Instruct-UD-Q6_K_XL.gguf
--port ${PORT}
--flash-attn # tried with and without
--threads 16
--gpu-layers 33 # modifying this by a few steps can dramatically change output
--ctx-size 65536
--temp 0.7
--top-k 20
--top-p 0.8
--min-p 0.0
--repeat-penalty 1.05
--cache-type-k q8_0 # tried with and without
--cache-type-v q8_0 # tried with and without
--jinja
I'm just wondering if anyone else is seeing this in general or at least in terms of their RooCode usage? Is there a problem with the quants, or was the coder model just released in a poor state?
(Worth mentioning that I'm using llama.cpp, and I'm using the settings exactly as described on the Unsloth website. I've also experimented with enabling/disabling things and while the results do change... dramatically... I still get tool calling errors and endless loops.)
Two things I have noticed in particular:
If the Coder model is creating a file from scratch (meaning a file didn't previously exist and it creates it), I tend to get better outcomes. If I create a blank file and ask it to add code to that file, it can get rough (especially if it goes hunting for other files to reference, even when I ask it not to).
The more files the Coder model engages with (I'm not talking about a lot here either, maybe 1 file vs 4 or 5 files), the worse tool calling experience I seem to get. If I ask it to only work on a single file, it often works a bit better. But even then it isn't working every single time, and sometimes it ignores my request and decides to reference other files when I explicitly ask it to "only reference and edit this abc.py file".
We're working on a fix and will update you guys ASAP on the issue!
Incredible. Really appreciate all the efforts! Looking forward to the updates!
@JamesMowery btw can somehow try the below to see if it works as expected - we confirmed the below for now works as intended:
(Use tmux
to load the llama.cpp server on 1 side then CTRL+B+D
. To get it back tmux attach-session -t0
)
./llama.cpp/llama-server \
--model unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF/Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL.gguf \
--alias "unsloth/Qwen3-Coder-30B-A3B-Instruct" \
--jinja \
--device CUDA0 \
--log-verbosity 99 \
--port 8001
and with Python code calling the model:
from openai import OpenAI
import json
openai_client = OpenAI(
base_url = "http://127.0.0.1:8001/v1",
api_key = "sk-no-key-required",
)
completion = openai_client.chat.completions.create(
model = "unsloth/Qwen3-Coder-30B-A3B-Instruct",
messages = [{"role": "user", "content": "What is 2+2?"},],
)
print(completion.choices[0].message.content)
def get_current_temperature(location: str, unit: str = "celsius"):
"""Get current temperature at a location.
Args:
location: The location to get the temperature for, in the format "City, State, Country".
unit: The unit to return the temperature in. Defaults to "celsius". (choices: ["celsius", "fahrenheit"])
Returns:
the temperature, the location, and the unit in a dict
"""
return {
"temperature": 26.1,
"location": location,
"unit": unit,
}
def get_temperature_date(location: str, date: str, unit: str = "celsius"):
"""Get temperature at a location and date.
Args:
location: The location to get the temperature for, in the format "City, State, Country".
date: The date to get the temperature for, in the format "Year-Month-Day".
unit: The unit to return the temperature in. Defaults to "celsius". (choices: ["celsius", "fahrenheit"])
Returns:
the temperature, the location, the date and the unit in a dict
"""
return {
"temperature": 25.9,
"location": location,
"date": date,
"unit": unit,
}
def get_function_by_name(name):
if name == "get_current_temperature":
return get_current_temperature
elif name == "get_temperature_date":
return get_temperature_date
else:
raise RuntimeError(f"No function named {name}")
weather_tool_calls = [
{
"type": "function",
"function": {
"name": "get_current_temperature",
"description": "Get current temperature at a location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": 'The location to get the temperature for, in the format "City, State, Country".',
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": 'The unit to return the temperature in. Defaults to "celsius".',
},
},
"required": ["location"],
},
},
},
{
"type": "function",
"function": {
"name": "get_temperature_date",
"description": "Get temperature at a location and date.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": 'The location to get the temperature for, in the format "City, State, Country".',
},
"date": {
"type": "string",
"description": 'The date to get the temperature for, in the format "Year-Month-Day".',
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": 'The unit to return the temperature in. Defaults to "celsius".',
},
},
"required": ["location", "date"],
},
},
},
]
messages = [
{"role": "user", "content": "What's the temperature in San Francisco now? How about tomorrow? Today's date is 2024-09-30."},
]
completion = openai_client.chat.completions.create(
model = "unsloth/Qwen3-Coder-30B-A3B-Instruct",
messages = messages,
tools = weather_tool_calls,
)
print(completion.choices[0].message.tool_calls)
message = completion.choices[0].message
messages.append(message)
while len((tool_call := message.tool_calls) or []) != 0:
tool_call = tool_call[0]
function_name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)
result = get_function_by_name(function_name)(**arguments)
print(function_name, result)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": str(result),
})
completion = openai_client.chat.completions.create(
model = "unsloth/Qwen3-Coder-30B-A3B-Instruct",
messages = messages,
tools = weather_tool_calls,
)
message = completion.choices[0].message
print(message.content)
you should get:
2 + 2 = 4
[ChatCompletionMessageToolCall(id='x1NNOuJijQ3sYh3Gmy9voH3KqGmOAcxs', function=Function(arguments='{"location":"San Francisco, California, USA","unit":"celsius"}', name='get_current_temperature'), type='function')]
get_current_temperature {'temperature': 26.1, 'location': 'San Francisco, California, USA', 'unit': 'celsius'}
get_temperature_date {'temperature': 25.9, 'location': 'San Francisco, California, USA', 'date': '2024-10-01', 'unit': 'celsius'}
The current temperature in San Francisco is 26.1°C.
For tomorrow, October 1, 2024, the temperature is expected to be 25.9°C.
So I did this (let me know if this is still correct):
llama-server \
--model /mnt/big/AI/models/llamacpp/Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL.gguf \
--alias "unsloth/Qwen3-Coder-30B-A3B-Instruct" \
--jinja \
--device CUDA0 \
--log-verbosity 99 \
--port 8081
Output:
uv run unsloth.py
2 + 2 = 4
[ChatCompletionMessageToolCall(id='mMcnQBW8XvoktpDfa9hc3E0lw6ndKjxd', function=Function(arguments='{"location":"San Francisco, California, USA"}', name='get_current_temperature'), type='function')]
get_current_temperature {'temperature': 26.1, 'location': 'San Francisco, California, USA', 'unit': 'celsius'}
get_temperature_date {'temperature': 25.9, 'location': 'San Francisco, California, USA', 'date': '2024-10-01', 'unit': 'celsius'}
The current temperature in San Francisco is 26.1°C.
For tomorrow, the temperature is expected to be 25.9°C.
So that looks okay I believe.
Is that supposed to fix RooCode though? Because it actually made it even worse. It didn't use the tasklist, it didn't use any tools again. It only wrote things in chat (I killed it early).
My prompt:
Create a rampart game in the rampart.py file using Pygame. Only read, reference, and use the rampart.py file.
Output:
Initial Checkpoint
Current
I'll create a rampart game using Pygame in the rampart.py file. Let me first check what's already in the file to understand the current state.
I need to:
First check the current content of rampart.py to see what's already there
Create a rampart game using Pygame
Make sure I only use and reference the rampart.py file as requested
Let me start by reading the rampart.py file.
<read_file path="rampart.py"/>
I need to create a rampart game using Pygame in the rampart.py file. Let me first check what's already in the file to understand the current state.
<read_file path="rampart.py"/>
I'll create a rampart game using Pygame in the rampart.py file. Let me implement the game with the necessary components.
import pygame
import sys
import random
...
...
...
Missed the read_file
tool call twice at the start and I killed it shortly thereafter.
If I load up the Non Thinking model it will create the task list, update the task list each step, call every tool without issue, write code in the editor, and work beautifully.
Coder with this fix still just does the same thing where it writes all the code in the chat and doesn't interact at all with VS Code's editor.
Another test I did I asked it to create a file from scratch with a simple snake game and it just did it, but it still didn't really use the task list correctly in RooCode (it just kind of printed a task list after the fact, didn't even present it to me before hand), didn't behave like what you would expect to see in RooCode at all.
Again just everything about the Think and Non Think models from earlier this week just blow Coder out of the water in RooCode. It's just so odd that it's so different and worse now.
I also attempted to load this exact setup in Charm's new "Crush" Claude Code-esque tool they just released today that everyone is going crazy about.
Qwen3 Non Thinking works perfectly fine.
The fix still fails for Qwen3 Coder.
Lots of gibberish I can't decipher in the log when I just say "Hello" to it. Just infinitely spews out the following for Coder:
srv log_server_r: request: GET /health 127.0.0.1 200
[INFO] <Qwen3-Coder-30B-A3B-Instruct-UD-Q4KXL> Health check passed on http://localhost:5825/health
[DEBUG] <Qwen3-Coder-30B-A3B-Instruct-UD-Q4KXL> swapState() State transitioned from starting to ready
got exception: {"code":500,"message":"Value is not callable: null at row 62, column 114:\n {%- if json_key not in handled_keys %}\n {%- set normed_json_key = json_key | replace(\"-\", \"_\") | replace(\" \", \"_\") | replace(\"$\", \"\") %}\n ^\n {%- if param_fields[json_key] is mapping %}\n at row 62, column 21:\n {%- if json_key not in handled_keys %}\n {%- set normed_json_key = json_key | replace(\"-\", \"_\") | replace(\" \", \"_\") | replace(\"$\", \"\") %}\n ^\n {%- if param_fields[json_key] is mapping %}\n at row 61, column 55:\n {%- for json_key in param_fields %}\n {%- if json_key not in handled_keys %}\n ^\n {%- set normed_json_key = json_key | replace(\"-\", \"_\") | replace(\" \", \"_\") | replace(\"$\", \"\") %}\n at row 61, column 17:\n {%- for json_key in param_fields %}\n {%- if json_key not in handled_keys %}\n ^\n {%- set normed_json_key = json_key | replace(\"-\", \"_\") | replace(\" \", \"_\") | replace(\"$\", \"\") %}\n at row 60, column 48:\n {%- set handled_keys = ['type', 'description', 'enum', 'required'] %}\n {%- for json_key in param_fields %}\n ^\n {%- if json_key not in handled_keys %}\n at row 60, column 13:\n {%- set handled_keys = ['type', 'description', 'enum', 'required'] %}\n {%- for json_key in param_fields %}\n ^\n {%- if json_key not in handled_keys %}\n at row 49, column 80:\n {{- '\\n<parameters>' }}\n {%- for param_name, param_fields in tool.parameters.properties|items %}\n ^\n {{- '\\n<parameter>' }}\n at row 49, column 9:\n {{- '\\n<parameters>' }}\n {%- for param_name, param_fields in tool.parameters.properties|items %}\n ^\n {{- '\\n<parameter>' }}\n at row 42, column 29:\n {{- \"<tools>\" }}\n {%- for tool in tools %}\n ^\n {%- if tool.function is defined %}\n at row 42, column 5:\n {{- \"<tools>\" }}\n {%- for tool in tools %}\n ^\n {%- if tool.function is defined %}\n at row 39, column 51:\n{%- endif %}\n{%- if tools is iterable and tools | length > 0 %}\n ^\n {{- \"\\n\\nYou have access to the following functions:\\n\\n\" }}\n at row 39, column 1:\n{%- endif %}\n{%- if tools is iterable and tools | length > 0 %}\n^\n {{- \"\\n\\nYou have access to the following functions:\\n\\n\" }}\n at row 1, column 69:\n{#- Copyright 2025-present the Unsloth team. All rights reserved. #}\n ^\n{#- Licensed under the Apache License, Version 2.0 (the \"License\") #}\n","type":"server_error"}
srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 500
[DEBUG] <Qwen3-Coder-30B-A3B-Instruct-UD-Q4KXL> request /v1/chat/completions - start: 10.42166576s, total: 10.432124464s
[INFO] Request ::1 "POST /v1/chat/completions HTTP/1.1" 500 3198 "OpenAI/Go 1.11.1" 22.339064718s
[DEBUG] Exclusive mode for group (default), stopping other process groups
got exception: {"code":500,"message":"Value is not callable: null at row 62, column 114:\n {%- if json_key not in handled_keys %}\n {%- set normed_json_key = json_key | replace(\"-\", \"_\") | replace(\" \", \"_\") | replace(\"$\", \"\") %}\n ^\n {%- if param_fields[json_key] is mapping %}\n at row 62, column 21:\n {%- if json_key not in handled_keys %}\n {%- set normed_json_key = json_key | replace(\"-\", \"_\") | replace(\" \", \"_\") | replace(\"$\", \"\") %}\n ^\n {%- if param_fields[json_key] is mapping %}\n at row 61, column 55:\n {%- for json_key in param_fields %}\n {%- if json_key not in handled_keys %}\n ^\n {%- set normed_json_key = json_key | replace(\"-\", \"_\") | replace(\" \", \"_\") | replace(\"$\", \"\") %}\n at row 61, column 17:\n {%- for json_key in param_fields %}\n {%- if json_key not in handled_keys %}\n ^\n {%- set normed_json_key = json_key | replace(\"-\", \"_\") | replace(\" \", \"_\") | replace(\"$\", \"\") %}\n at row 60, column 48:\n {%- set handled_keys = ['type', 'description', 'enum', 'required'] %}\n {%- for json_key in param_fields %}\n ^\n {%- if json_key not in handled_keys %}\n at row 60, column 13:\n {%- set handled_keys = ['type', 'description', 'enum', 'required'] %}\n {%- for json_key in param_fields %}\n ^\n {%- if json_key not in handled_keys %}\n at row 49, column 80:\n {{- '\\n<parameters>' }}\n {%- for param_name, param_fields in tool.parameters.properties|items %}\n ^\n {{- '\\n<parameter>' }}\n at row 49, column 9:\n {{- '\\n<parameters>' }}\n {%- for param_name, param_fields in tool.parameters.properties|items %}\n ^\n {{- '\\n<parameter>' }}\n at row 42, column 29:\n {{- \"<tools>\" }}\n {%- for tool in tools %}\n ^\n {%- if tool.function is defined %}\n at row 42, column 5:\n {{- \"<tools>\" }}\n {%- for tool in tools %}\n ^\n {%- if tool.function is defined %}\n at row 39, column 51:\n{%- endif %}\n{%- if tools is iterable and tools | length > 0 %}\n ^\n {{- \"\\n\\nYou have access to the following functions:\\n\\n\" }}\n at row 39, column 1:\n{%- endif %}\n{%- if tools is iterable and tools | length > 0 %}\n^\n {{- \"\\n\\nYou have access to the following functions:\\n\\n\" }}\n at row 1, column 69:\n{#- Copyright 2025-present the Unsloth team. All rights reserved. #}\n ^\n{#- Licensed under the Apache License, Version 2.0 (the \"License\") #}\n","type":"server_error"}
srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 500
[DEBUG] <Qwen3-Coder-30B-A3B-Instruct-UD-Q4KXL> request /v1/chat/completions - start: 0s, total: 1.213853ms
[INFO] Request ::1 "POST /v1/chat/completions HTTP/1.1" 500 3198 "OpenAI/Go 1.11.1" 1.468444ms
[DEBUG] Exclusive mode for group (default), stopping other process groups
got exception: {"code":500,"message":"Value is not callable: null at row 62, column 114:\n {%- if json_key not in handled_keys %}\n {%- set normed_json_key = json_key | replace(\"-\", \"_\") | replace(\" \", \"_\") | replace(\"$\", \"\") %}\n ^\n {%- if param_fields[json_key] is mapping %}\n at row 62, column 21:\n {%- if json_key not in handled_keys %}\n {%- set normed_json_key = json_key | replace(\"-\", \"_\") | replace(\" \", \"_\") | replace(\"$\", \"\") %}\n ^\n {%- if param_fields[json_key] is mapping %}\n at row 61, column 55:\n {%- for json_key in param_fields %}\n {%- if json_key not in handled_keys %}\n ^\n {%- set normed_json_key = json_key | replace(\"-\", \"_\") | replace(\" \", \"_\") | replace(\"$\", \"\") %}\n at row 61, column 17:\n {%- for json_key in param_fields %}\n {%- if json_key not in handled_keys %}\n ^\n {%- set normed_json_key = json_key | replace(\"-\", \"_\") | replace(\" \", \"_\") | replace(\"$\", \"\") %}\n at row 60, column 48:\n {%- set handled_keys = ['type', 'description', 'enum', 'required'] %}\n {%- for json_key in param_fields %}\n ^\n {%- if json_key not in handled_keys %}\n at row 60, column 13:\n {%- set handled_keys = ['type', 'description', 'enum', 'required'] %}\n {%- for json_key in param_fields %}\n ^\n {%- if json_key not in handled_keys %}\n at row 49, column 80:\n {{- '\\n<parameters>' }}\n {%- for param_name, param_fields in tool.parameters.properties|items %}\n ^\n {{- '\\n<parameter>' }}\n at row 49, column 9:\n {{- '\\n<parameters>' }}\n {%- for param_name, param_fields in tool.parameters.properties|items %}\n ^\n {{- '\\n<parameter>' }}\n at row 42, column 29:\n {{- \"<tools>\" }}\n {%- for tool in tools %}\n ^\n {%- if tool.function is defined %}\n at row 42, column 5:\n {{- \"<tools>\" }}\n {%- for tool in tools %}\n ^\n {%- if tool.function is defined %}\n at row 39, column 51:\n{%- endif %}\n{%- if tools is iterable and tools | length > 0 %}\n ^\n {{- \"\\n\\nYou have access to the following functions:\\n\\n\" }}\n at row 39, column 1:\n{%- endif %}\n{%- if tools is iterable and tools | length > 0 %}\n^\n {{- \"\\n\\nYou have access to the following functions:\\n\\n\" }}\n at row 1, column 69:\n{#- Copyright 2025-present the Unsloth team. All rights reserved. #}\n ^\n{#- Licensed under the Apache License, Version 2.0 (the \"License\") #}\n","type":"server_error"}
srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 500
[DEBUG] <Qwen3-Coder-30B-A3B-Instruct-UD-Q4KXL> request /v1/chat/completions - start: 0s, total: 1.448614ms
[INFO] Request ::1 "POST /v1/chat/completions HTTP/1.1" 500 3198 "OpenAI/Go 1.11.1" 1.566744ms
[DEBUG] Exclusive mode for group (default), stopping other process groups
got exception: {"code":500,"message":"Value is not callable: null at row 62, column 114:\n {%- if json_key not in handled_keys %}\n {%- set normed_json_key = json_key | replace(\"-\", \"_\") | replace(\" \", \"_\") | replace(\"$\", \"\") %}\n ^\n {%- if param_fields[json_key] is mapping %}\n at row 62, column 21:\n {%- if json_key not in handled_keys %}\n {%- set normed_json_key = json_key | replace(\"-\", \"_\") | replace(\" \", \"_\") | replace(\"$\", \"\") %}\n ^\n {%- if param_fields[json_key] is mapping %}\n at row 61, column 55:\n {%- for json_key in param_fields %}\n {%- if json_key not in handled_keys %}\n ^\n {%- set normed_json_key = json_key | replace(\"-\", \"_\") | replace(\" \", \"_\") | replace(\"$\", \"\") %}\n at row 61, column 17:\n {%- for json_key in param_fields %}\n {%- if json_key not in handled_keys %}\n ^\n {%- set normed_json_key = json_key | replace(\"-\", \"_\") | replace(\" \", \"_\") | replace(\"$\", \"\") %}\n at row 60, column 48:\n {%- set handled_keys = ['type', 'description', 'enum', 'required'] %}\n {%- for json_key in param_fields %}\n ^\n {%- if json_key not in handled_keys %}\n at row 60, column 13:\n {%- set handled_keys = ['type', 'description', 'enum', 'required'] %}\n {%- for json_key in param_fields %}\n ^\n {%- if json_key not in handled_keys %}\n at row 49, column 80:\n {{- '\\n<parameters>' }}\n {%- for param_name, param_fields in tool.parameters.properties|items %}\n ^\n {{- '\\n<parameter>' }}\n at row 49, column 9:\n {{- '\\n<parameters>' }}\n {%- for param_name, param_fields in tool.parameters.properties|items %}\n ^\n {{- '\\n<parameter>' }}\n at row 42, column 29:\n {{- \"<tools>\" }}\n {%- for tool in tools %}\n ^\n {%- if tool.function is defined %}\n at row 42, column 5:\n {{- \"<tools>\" }}\n {%- for tool in tools %}\n ^\n {%- if tool.function is defined %}\n at row 39, column 51:\n{%- endif %}\n{%- if tools is iterable and tools | length > 0 %}\n ^\n {{- \"\\n\\nYou have access to the following functions:\\n\\n\" }}\n at row 39, column 1:\n{%- endif %}\n{%- if tools is iterable and tools | length > 0 %}\n^\n {{- \"\\n\\nYou have access to the following functions:\\n\\n\" }}\n at row 1, column 69:\n{#- Copyright 2025-present the Unsloth team. All rights reserved. #}\n ^\n{#- Licensed under the Apache License, Version 2.0 (the \"License\") #}\n","type":"server_error"}
Infinite repetition of the above... stuck in an infinite loop...
Conversely, here's the output for Qwen Non Thinking (I also confirmed it's working with the Q4 UD Non Thinking model, for fair quant to quant comparisons, but here's the output for the Q6 Non Thinking):
srv log_server_r: request: GET /health 127.0.0.1 200
[INFO] <Qwen3-30B-A3B-Instruct-2507-UD-Q6KXL> Health check passed on http://localhost:5816/health
[DEBUG] <Qwen3-30B-A3B-Instruct-2507-UD-Q6KXL> swapState() State transitioned from starting to ready
srv params_from_: Chat format: Hermes 2 Pro
slot launch_slot_: id 0 | task 0 | processing task
slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 131072, n_keep = 0, n_prompt_tokens = 10751
slot update_slots: id 0 | task 0 | kv cache rm [0, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 2048, n_tokens = 2048, progress = 0.190494
slot update_slots: id 0 | task 0 | kv cache rm [2048, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 4096, n_tokens = 2048, progress = 0.380988
slot update_slots: id 0 | task 0 | kv cache rm [4096, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 6144, n_tokens = 2048, progress = 0.571482
slot update_slots: id 0 | task 0 | kv cache rm [6144, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 8192, n_tokens = 2048, progress = 0.761976
slot update_slots: id 0 | task 0 | kv cache rm [8192, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 10240, n_tokens = 2048, progress = 0.952470
slot update_slots: id 0 | task 0 | kv cache rm [10240, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 10751, n_tokens = 511, progress = 1.000000
slot update_slots: id 0 | task 0 | prompt done, n_past = 10751, n_tokens = 511
slot release: id 0 | task 0 | stop processing: n_past = 10760, truncated = 0
slot print_timing: id 0 | task 0 |
prompt eval time = 14212.24 ms / 10751 tokens ( 1.32 ms per token, 756.46 tokens per second)
eval time = 561.83 ms / 10 tokens ( 56.18 ms per token, 17.80 tokens per second)
total time = 14774.07 ms / 10761 tokens
srv update_slots: all slots are idle
srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
[DEBUG] <Qwen3-30B-A3B-Instruct-2507-UD-Q6KXL> request /v1/chat/completions - start: 15.271124006s, total: 30.122473097s
And very quickly get a response.
@JamesMowery
thanks for letting us know more about the issue! As an update, good news we think we fixed it!!
The culprit was it seems like other systems (Roo Code etc) don't use the recommend ["type", "enum", "description"]
, and use something else, so the below would fail:
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": 'The unit to return the temperature in. Defaults to "celsius".',
"********1" : "********************1",
"********2" : ["********************2"],
"********3" : {"********************3" : "**", "********************3" : "**"},
"********4" : None,
},
I have now updated the template, and can verify it works on my side - please verify and see if the new chat template works!
You do NOT need to download the model again Instead download the new template via hf download unsloth/Qwen3-Coder-30B-A3B-Instruct chat_template.jinja --local-dir unsloth
or wget
hf download unsloth/Qwen3-Coder-30B-A3B-Instruct chat_template.jinja --local-dir unsloth
./llama.cpp/llama-server \
--model unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF/Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL.gguf \
--alias "unsloth/Qwen3-Coder-30B-A3B-Instruct" \
--jinja \
--threads -1 \
--n-gpu-layers 999 \
--device CUDA0 \
--min_p 0.00 \
--log-verbosity 99 \
--port 8001 \
--chat-template-file unsloth/chat_template.jinja
then test it:
from openai import OpenAI
import json
openai_client = OpenAI(
base_url = "http://127.0.0.1:8001/v1",
api_key = "sk-no-key-required",
)
completion = openai_client.chat.completions.create(
model = "unsloth/Qwen3-Coder-30B-A3B-Instruct",
messages = [{"role": "user", "content": "What is 2+2?"},],
)
print(completion.choices[0].message.content)
def get_current_temperature(location: str, unit: str = "celsius"):
"""Get current temperature at a location.
Args:
location: The location to get the temperature for, in the format "City, State, Country".
unit: The unit to return the temperature in. Defaults to "celsius". (choices: ["celsius", "fahrenheit"])
Returns:
the temperature, the location, and the unit in a dict
"""
return {
"temperature": 26.1,
"location": location,
"unit": unit,
}
def get_temperature_date(location: str, date: str, unit: str = "celsius"):
"""Get temperature at a location and date.
Args:
location: The location to get the temperature for, in the format "City, State, Country".
date: The date to get the temperature for, in the format "Year-Month-Day".
unit: The unit to return the temperature in. Defaults to "celsius". (choices: ["celsius", "fahrenheit"])
Returns:
the temperature, the location, the date and the unit in a dict
"""
return {
"temperature": 25.9,
"location": location,
"date": date,
"unit": unit,
}
def get_function_by_name(name):
if name == "get_current_temperature":
return get_current_temperature
elif name == "get_temperature_date":
return get_temperature_date
else:
raise RuntimeError(f"No function named {name}")
weather_tool_calls = [
{
"type": "function",
"function": {
"name": "get_current_temperature",
"description": "Get current temperature at a location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": 'The location to get the temperature for, in the format "City, State, Country".',
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": 'The unit to return the temperature in. Defaults to "celsius".',
"********1" : "********************1",
"********2" : ["********************2"],
"********3" : {"********************3" : "**", "********************3" : "**"},
"********4" : None,
},
},
"required": ["location"],
},
},
},
{
"type": "function",
"function": {
"name": "get_temperature_date",
"description": "Get temperature at a location and date.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": 'The location to get the temperature for, in the format "City, State, Country".',
},
"date": {
"type": "string",
"description": 'The date to get the temperature for, in the format "Year-Month-Day".',
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": 'The unit to return the temperature in. Defaults to "celsius".',
},
},
"required": ["location", "date"],
},
},
},
]
messages = [
{"role": "user", "content": "What's the temperature in San Francisco now? How about tomorrow? Today's date is 2024-09-30."},
]
completion = openai_client.chat.completions.create(
model = "unsloth/Qwen3-Coder-30B-A3B-Instruct",
messages = messages,
tools = weather_tool_calls,
)
print(completion.choices[0].message.tool_calls)
message = completion.choices[0].message
messages.append(message)
while len((tool_call := message.tool_calls) or []) != 0:
tool_call = tool_call[0]
function_name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)
result = get_function_by_name(function_name)(**arguments)
print(function_name, result)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": str(result),
})
completion = openai_client.chat.completions.create(
model = "unsloth/Qwen3-Coder-30B-A3B-Instruct",
messages = messages,
tools = weather_tool_calls,
)
message = completion.choices[0].message
print(message.content)
you should get:
2 + 2 = 4
[ChatCompletionMessageToolCall(id='7AUqc1Qm1qFHYNddU3PBhmEkmoQj2HE1', function=Function(arguments='{"location":"San Francisco, California, USA"}', name='get_current_temperature'), type='function')]
get_current_temperature {'temperature': 26.1, 'location': 'San Francisco, California, USA', 'unit': 'celsius'}
get_temperature_date {'temperature': 25.9, 'location': 'San Francisco, California, USA', 'date': '2024-10-01', 'unit': 'celsius'}
The current temperature in San Francisco is 26.1°C. Tomorrow's temperature is expected to be 25.9°C.
Sorry on the issue, and hopefully this will fix it! Please test it and get back to me - I'll try my best asap to fix it if more issues arise!
llama.cpp 6054
Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL.gguf
using the updated 7056 bytes jinja
roocode 3.25.4 still fails with function call
it's either Roo's error, it got the model's intention to read file but fails
Roo wants to read this file:
plan.md
Roo is having trouble...
Roo appears to be stuck in a loop, attempting the same action (read_file) repeatedly. This might indicate a problem with its current strategy. Consider rephrasing the task, providing more specific instructions, or guiding it towards a different approach.
or the model's error, didn't use function call
I'll use the read_file tool to examine the contents of plan.md.
I need to review the plan that was created. Let me first check the content of the plan.md file to understand what has been planned so far.
Roo is having trouble...
This may indicate a failure in the model's thought process or inability to use a tool properly, which can be mitigated with some user guidance (e.g. "Try breaking down the task into smaller steps").
I posted on the other thread (got the threads switched up accidentally; my apologies) but yeah I have the same error Smoked posted.
I posted a lot more logs and screenshots on the other thread.
Fwiw, I had the same issues with different quants, swapped over to the official api from alibaba modelstudio and the model was still erroring out in roocode.
Fwiw, I had the same issues with different quants, swapped over to the official api from alibaba modelstudio and the model was still erroring out in roocode.
Uh oh. Well that is REALLY unfortunate. Maybe the new coder model is genuinely busted? It's just so weird how this is possible though considering how great the other two models released this week are doing. Surely this regression in tool calling can't be intended.
Fwiw, I had the same issues with different quants, swapped over to the official api from alibaba modelstudio and the model was still erroring out in roocode.
Uh oh. Well that is REALLY unfortunate. Maybe the new coder model is genuinely busted? It's just so weird how this is possible though considering how great the other two models released this week are doing. Surely this regression in tool calling can't be intended.
I am testing it now through the API in the qwen-code cli app and it is performing fine, though the roocode issues with the api do make me wonder whether the quants are messed up or something upstream that would fall on the model. Regardless, I don't posses the knowledge to speak definitively on this so I am just guessing based off what I experienced with the api behaving the same as the local quant.
I'm just wondering if anyone else is seeing this in general or at least in terms of their RooCode usage? Is there a problem with the quants, or was the coder model just released in a poor state?
Testing Q4 and Q6 here, works fine most of the time but sometimes it makes tool calls on RooCode without the proper format... But I don't think this is a problem with the chat template, tools on RooCode use a custom format with XML tags not the tools in the OpenAI like API... https://docs.roocode.com/basic-usage/how-tools-work
Not sure if related, but there is also this issue on llama.cpp and I saw similar problems with tool calling (but not exactly the same ones) using Devstral: https://github.com/ggml-org/llama.cpp/issues/14697
Running the qwen3-coder-30b-a3b-instruct-mlx@8bit on 48GB Macbook M4 pro with 256k context, it's running around 50 tokens per seconds which is just amazing. I've loaded it in LM Studio and used it from my desktop machine in Roo Code (who would have though a laptop as a server). It ran wel when I gave a prompt to write an svg editor in code mode, but when trying it from the orchestrator mode I ran into problems when it switched to code mode.
Roo tried to use write_to_file without value for required parameter 'path'. Retrying...
Roo tried to use apply_diff without value for required parameter 'path'. Retrying...
Roo is having trouble...
This may indicate a failure in the model's thought process or inability to use a tool properly, which can be mitigated with some user guidance (e.g. "Try breaking down the task into smaller steps").
I switched to q6, but it sometimes adds a "X" behind the filename when giving terminal commands. Eventualy Q6 failed, it switched to trying updating the files from powershell with set-content :-) points for effort.
I've been trying to prompt Qwen3-Coder with the above fixes, and getting a lot of core dumps.
<tool_call>
<function=platform__
srv send: sending result for task id = 1028
srv send: task id = 1028 pushed to result queue
slot process_toke: id 0 | task 1028 | n_decoded = 83, n_remaining = -1, next token: 563 '__'
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1111
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1112, front = 0
slot update_slots: id 0 | task 1028 | slot decode token, n_ctx = 4096, n_past = 2497, n_cache_tokens = 2497, truncated = 1
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
data stream, to_send: data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":"__"}}],"created":1754156158,"id":"chatcmpl-2AVuzzHE6yBUJuyHSDf1j4e8zz1i3wzq","model":"unsloth/Qwen3-Coder-30B-A3B-Instruct","system_fingerprint":"b6060-9c35706b","object":"chat.completion.chunk"}
Grammar still awaiting trigger after token 1607 (`list`)
srv update_chat_: Parsing chat message: I apologize for the confusion. It seems I'm having trouble accessing the project summary. Let me try a different approach to provide you with information about the current project.
Since I'm unable to directly access project files, I'll need to rely on the information available through my tools. Let me check if there's any general information I can provide about the project structure or current status.
<tool_call>
<function=platform__list
Parsing input with format Hermes 2 Pro: I apologize for the confusion. It seems I'm having trouble accessing the project summary. Let me try a different approach to provide you with information about the current project.
Since I'm unable to directly access project files, I'll need to rely on the information available through my tools. Let me check if there's any general information I can provide about the project structure or current status.
<tool_call>
<function=platform__list
srv send: sending result for task id = 1028
srv send: task id = 1028 pushed to result queue
slot process_toke: id 0 | task 1028 | n_decoded = 84, n_remaining = -1, next token: 1607 'list'
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1112
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1113, front = 0
slot update_slots: id 0 | task 1028 | slot decode token, n_ctx = 4096, n_past = 2498, n_cache_tokens = 2498, truncated = 1
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
data stream, to_send: data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":"list"}}],"created":1754156158,"id":"chatcmpl-2AVuzzHE6yBUJuyHSDf1j4e8zz1i3wzq","model":"unsloth/Qwen3-Coder-30B-A3B-Instruct","system_fingerprint":"b6060-9c35706b","object":"chat.completion.chunk"}
Grammar still awaiting trigger after token 35569 (`_resources`)
srv update_chat_: Parsing chat message: I apologize for the confusion. It seems I'm having trouble accessing the project summary. Let me try a different approach to provide you with information about the current project.
Since I'm unable to directly access project files, I'll need to rely on the information available through my tools. Let me check if there's any general information I can provide about the project structure or current status.
<tool_call>
<function=platform__list_resources
Parsing input with format Hermes 2 Pro: I apologize for the confusion. It seems I'm having trouble accessing the project summary. Let me try a different approach to provide you with information about the current project.
Since I'm unable to directly access project files, I'll need to rely on the information available through my tools. Let me check if there's any general information I can provide about the project structure or current status.
<tool_call>
<function=platform__list_resources
srv send: sending result for task id = 1028
srv send: task id = 1028 pushed to result queue
slot process_toke: id 0 | task 1028 | n_decoded = 85, n_remaining = -1, next token: 35569 '_resources'
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1113
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 1114, front = 0
slot update_slots: id 0 | task 1028 | slot decode token, n_ctx = 4096, n_past = 2499, n_cache_tokens = 2499, truncated = 1
srv update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
data stream, to_send: data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":"_resources"}}],"created":1754156158,"id":"chatcmpl-2AVuzzHE6yBUJuyHSDf1j4e8zz1i3wzq","model":"unsloth/Qwen3-Coder-30B-A3B-Instruct","system_fingerprint":"b6060-9c35706b","object":"chat.completion.chunk"}
Grammar triggered on regex: '<function=platform__list_resources>
'
srv update_chat_: Parsing chat message: I apologize for the confusion. It seems I'm having trouble accessing the project summary. Let me try a different approach to provide you with information about the current project.
Since I'm unable to directly access project files, I'll need to rely on the information available through my tools. Let me check if there's any general information I can provide about the project structure or current status.
<tool_call>
<function=platform__list_resources>
Parsing input with format Hermes 2 Pro: I apologize for the confusion. It seems I'm having trouble accessing the project summary. Let me try a different approach to provide you with information about the current project.
Since I'm unable to directly access project files, I'll need to rely on the information available through my tools. Let me check if there's any general information I can provide about the project structure or current status.
<tool_call>
<function=platform__list_resources>
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 2, column 1: syntax error while parsing value - unexpected end of input; expected '[', '{', or a literal: <<<
>>>
terminate called after throwing an instance of 'std::runtime_error'
what(): Invalid diff: 'I apologize for the confusion. It seems I'm having trouble accessing the project summary. Let me try a different approach to provide you with information about the current project.
Since I'm unable to directly access project files, I'll need to rely on the information available through my tools. Let me check if there's any general information I can provide about the project structure or current status.
<tool_call>
<function=platform__list_resources' not found at start of 'I apologize for the confusion. It seems I'm having trouble accessing the project summary. Let me try a different approach to provide you with information about the current project.
Since I'm unable to directly access project files, I'll need to rely on the information available through my tools. Let me check if there's any general information I can provide about the project structure or current status.
<tool_call>
'
Aborted (core dumped)
I don't think this is isolated to roo code, I'm experiencing this in qwen-code. It will just stop after what seems to be when it would want to make a tool cal:
✦ I'll analyze the project structure and create a QWEN.md documentation file for myself.
First, let me examine the key files to understand what this project is about.
for example. I've tried different UD quants, and context lengths up to 64k.
I'm using ollama as the llm endpoint.
same behavior with opencode + ollama