bartowski/mistralai_Mistral-Small-3.2-24B-Instruct-2506-GGUF

Jun 21

•

Hi, thank you for the GGUF model!

It seems there is an issue somewhere compare to 3.1 version with TOOL calling.

Test prompt:

read init.sh

Response:

<file contents> and relevant summary about it.

This is an existing file. It used to work with 3.1:
hf.co/bartowski/mistralai_Mistral-Small-3.1-24B-Instruct-2503-GGUF:Q5_K_L

With 3.2 it does not work:
hf.co/bartowski/mistralai_Mistral-Small-3.2-24B-Instruct-2506-GGUF:Q6_K

Test prompt:

read init.sh

Response:

[TOOL_CALLS]builtin_read_file<SPECIAL_32>{"filepath": "init.sh"}

Is this model config related issue or should I raise an issue with extension developer?

I've double checked model system prompts they are the same.

Extension is ContinueDev for VScode.

cross-discussion in Discord:
https://discord.com/channels/1108621136150929458/1385904830198845541

bartowski

Owner Jun 21

They did change something with tool calling and I tried to follow along but may not have gotten it perfect D:

felikz

Jun 21

•

edited Jun 21

@bartowski thank you for reply!
Do you planning to have a look or its out of scope? It’s all magic to me, otherwise I would offer to help.

bartowski

Owner Jun 21

The biggest issue at this time is not being sure if the template is wrong or if the tool calling tools need updating to support it, since they explicitly called out in the model update a new template that works better for their model

felikz

Jun 21

Hey @bartowski , I've just checked version from unsloth - it seems to work fine with tools. I guess its open-sourced as well, so the fix is somewhere out there:
hf.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF:Q6_K

bartowski

Owner Jun 23

I have some fixes i'm testing locally, but they don't work 100% of the time with llama.cpp, so i'm trying to figure out why before re-pushing all the files

I think it's more proper though, so i will try to release it ASAP!

bartowski

Owner Jun 23

Okay I think i have the fix, though it's going to require specifying the .jinja file, which I'll commit to llama.cpp where others exist

Not sure why it doesn't work without the jinja file, since it's identical to what i'm putting in the model, there must be something funky about it!

Now it'll just be yesterday that doesn't work, but maybe I'll try to fix that in minja itself

bartowski

Owner Jun 23

@felikz some of the updated files are getting uploaded, would you be able to test if it works for your use case? Q5_K_L is up already if you want to test 1:1

felikz

Jun 23

@bartowski

$ ollama pull hf.co/bartowski/mistralai_Mistral-Small-3.2-24B-Instruct-2506-GGUF:Q5_K_L
pulling manifest
Error: pull model manifest: 400: {"error":"The specified tag is not a valid quantization scheme. Please use another tag or \"latest\""}

I will download Q6 and will report back.

felikz

Jun 23

@bartowski
Unfortunately, same output.🤷‍♂️

[TOOL_CALLS]builtin_read_file<SPECIAL_32>{"filepath": "init.sh"}

bartowski

Owner Jun 23

So that's interesting, I saw the same in lmstudio, but llama.cpp is perfect

Also that's the model itself doing something silly, so I don't really know what it's doing there 🤔

atopwhether

Jun 24

•

edited Jun 24

First of all, I'm super excited for you guys new model coming out. That's awesome, congratulations!

I've been testing it out and trying to debug for a couple of hours now. I think I solved it? Unsloth© works if you also manually download their Jinja© and load it. Newest quants fail without it. So that's good it works,

Your pull request template would not work for me on a fully updated llama.cpp, built just prior to posting this. I don't know if it's my workflow and programs being used. Very well could be. It's late so maybe I overlooked something or a little cursed, baffling how it's working fine for you, oddly wonky.

I rewrote your Jinja, modifying a few things. If it's not just my environment that's the issue, feel free to grab anything useful from it. It also may be complete crap; in that case, I apologize to anyone's eyes that had the unfortunate luck of looking at it.

Just to show it works, I'll drop the error log from your PR in a paste, as it is rather long, and the Jinja will be in there too. (If anyone's curious, program is https://crates.io/crates/aichat - awesome stuff)

~  v3.13.3 took 7s
❯ ai --model llamacpp:missy --role %functions% 'what is the weather in Ireland?'
Call get_current_weather {"location":"Ireland."}
The weather in Ireland is currently cloudy with a temperature of 15°C and a wind speed of 4.4 meters per second.

~  v3.13.3 took 4s
❯ ai --agent todo list all my todos
Here are your todos:

Watch a movie
Watch a movie
Buy milk
I'm feeling confident today, up my copyright of 144 lines of code containing only elif statements to 200 lines of elif statements. Note for future
me: Your doing great.

~  v3.13.3 took 10s
❯ ai --model llamacpp:missy --role %functions% 'what time is it?'
Call get_current_time {}
It is currently redacted 2025.

jinja updated:
https://privatebin.net/?a5ebe608aae04aad#3kbftvnP917CNckhXKDwUfdu5o6WH2dxTm6RUSnv5SMA

And error logs from yours, as well as both full llamacpp info:
https://privatebin.net/?4a47c80fc919c41b#5HyvuMvQRo95nCY84ijLGMRvD1MnQDJU4V2xEMX9STrW

bartowski

Owner Jun 24

Hmmm I'll take a look in a bit!

If yours works fine, are you fine if I use it for the chat template as well? Where did the fixes come from?

bartowski

Owner Jun 24

•

edited Jun 24

Ah I see the main thing you did was remove the != 9... I mean, that's in theory probably fine, but the mistral template does very specifically validate that the call ID be of length 9, so i don't know if it's really proper to remove that:

https://github.com/mistralai/mistral-common/blob/main/src/mistral_common/protocol/instruct/validator.py#L365

it may make a tool work, but i think it's more appropriate the the tool be updated, especially for proper support on what mistral has been trained on

de-facto

Jun 24

•

edited Jun 25

Hey,

many thanks for creating all these awesome gourmet GGUFs :)
Interesting discussion about the new toolcalls format in Mistral Tekken v11.
If the jinja template does format the chat history in Mistral Tekken v11 (with the new syntax for Mistral Small 3.2 for toolcalls), would the llama.cpp source code also have to provide an additional internal grammar (to the existing Nemo Tekken v3 old Small Tekken v7) with the new format for toolcalls (Mistral Small 3.2 e.g. Mistral Tekken v11)?
How could this be detected if Tekken v3/v7 is triggered by the template containing [TOOL_CALLS] (see [1])?
Could this be distinguished by determining if the template also contains the mandatory [ARGS] (in v11) and ignore if it contains the [CALL_ID] which may be optional in Mistral Tekken v11 (at least in their unit tests see [2])?
If i understand correctly right now llama.cpp forces Mistral Small 3.2 to output toolcalls in Mistral Tekken v3/v7 format (like Nemo see [3]) but the jinja template formats these toolcalls later in the chat history to appear in the new Mistral Tekken v11 format.
Curious to hear your thoughts about this,

Cheers.

References:
[1] https://github.com/ggml-org/llama.cpp/blob/06cbedfca1587473df9b537f1dd4d6bfa2e3de13/common/chat.cpp#L1794
[2] https://github.com/mistralai/mistral-common/blob/main/tests/test_tokenizer_v11.py
[3] https://github.com/ggml-org/llama.cpp/blob/06cbedfca1587473df9b537f1dd4d6bfa2e3de13/common/chat.cpp#L867

bartowski
/

mistralai_Mistral-Small-3.2-24B-Instruct-2506-GGUF

tools calling