Qwen3-Coder Tool Calling Fixes
#10
pinned
by
danielhanchen
- opened
Hey everyone! We managed to fix tool calling via llama.cpp --jinja
specifically for serving through llama-server
!
PLEASE NOTE: This issue was universal and affected all uploads (not just Unsloth) regardless of source/uploader, and we've communicated with the Qwen team about our fixes!
To get the latest updates, either do:
- Download the first file at https://huggingface.co/unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF/tree/main/UD-Q2_K_XL for UD-Q2_K_XL, and replace your current file
- Use
snapshot_download
as usual as in https://docs.unsloth.ai/basics/qwen3-coder-how-to-run-locally#llama.cpp-run-qwen3-tutorial which will auto override the old files - Use the new chat template via
--chat-template-file
. See GGUF chat template or chat_template.jinja - As an extra, I also made 1 single 150GB UD-IQ1_M file (so Ollama works) at https://huggingface.co/unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF/blob/main/Qwen3-Coder-480B-A35B-Instruct-UD-IQ1_M.gguf
This should solve issues like https://github.com/ggml-org/llama.cpp/issues/14915
danielhanchen
pinned discussion