Qwen3-Coder Tool Calling Fixes

#10
by danielhanchen - opened

Hey everyone! We managed to fix tool calling via llama.cpp --jinja specifically for serving through llama-server!

PLEASE NOTE: This issue was universal and affected all uploads (not just Unsloth) regardless of source/uploader, and we've communicated with the Qwen team about our fixes!

To get the latest updates, either do:

  1. Download the first file at https://huggingface.co/unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF/tree/main/UD-Q2_K_XL for UD-Q2_K_XL, and replace your current file
  2. Use snapshot_download as usual as in https://docs.unsloth.ai/basics/qwen3-coder-how-to-run-locally#llama.cpp-run-qwen3-tutorial which will auto override the old files
  3. Use the new chat template via --chat-template-file. See GGUF chat template or chat_template.jinja
  4. As an extra, I also made 1 single 150GB UD-IQ1_M file (so Ollama works) at https://huggingface.co/unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF/blob/main/Qwen3-Coder-480B-A35B-Instruct-UD-IQ1_M.gguf

This should solve issues like https://github.com/ggml-org/llama.cpp/issues/14915

danielhanchen pinned discussion

Sign up or log in to comment