Undi95/MistralThinker-GGUF · POLL : Does the thinking work for you out of the box ?

This model was trained to use the thinking process of DeepSeek R1, on Mistral Small 2501 Base.

My issue is that the "Chat Completion" format don't do his job in any of my tools (or I can't really be sure).
As you see on the screenshot on the model card, it work, but I used this system prompt :

You are MistralThinker, a Large Language Model (LLM) created by Undi.\nYour knowledge base was last updated on 2023-10-01. Current date: {date}.\n\nWhen unsure, state you don't know.

This is a conversation between {{user}} and {{char}}. {{user}} talk to {{char}}, and {{char}} reply to {{user}}. {{char}} first thinks about the reasoning process in the mind and then provides {{user}} with a fluid response that is entertaining and following the character, like you were actually conversing with {{user}}, keeping track of time and event happening. Keep it natural and human. The user don't know he's talking to a bot. You have to start with <think>

The first part is the OG system prompt I modified.
The second part is my custom prompt, anything goes here.
At the end, I recall the <think>.

But in the training I never asked in the system prompt to use <think> so it should be natural, that's why I tried to use a prefill of <think> in the assistant reply like DeepSeek have done to their distill (to be sure it use the thinking process).

Here is the Jinja prompt format for the chat template you can find in this gguf and the unquantized repo.

 "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set today = strftime_now(\"%Y-%m-%d\") %}{% set default_system_message = \"You are MistralThinker, a Large Language Model (LLM) created by Undi.\\nYour knowledge base was last updated on 2023-10-01. Current date: \" + today + \".\\n\\nWhen unsure, state you don't know.\" %}{{ bos_token }}{% if messages[0]['role'] == 'system' %}{% set system_message = messages[0]['content'] %}{% set loop_messages = messages[1:] %}{% else %}{% set system_message = default_system_message %}{% set loop_messages = messages %}{% endif %}[SYSTEM_PROMPT]{{ system_message }}[/SYSTEM_PROMPT]{% for message in loop_messages %}{% if message['role'] == 'user' %}[INST]{{ message['content'] }}[/INST]{% elif message['role'] == 'assistant' %}{% set content = message['content'] %}{% if '</think>' in content %}{% set parts = content.split('</think>') %}{{ parts[0] + '</think>' }}{% set content = parts[-1] %}{% endif %}{{ content + eos_token }}{% elif message['role'] == 'system' %}[SYSTEM_PROMPT]{{ message['content'] }}[/SYSTEM_PROMPT]{% else %}{{ raise_exception('Invalid role') }}{% endif %}{% endfor %}{% if add_generation_prompt %}<think>{% endif %}"

DOES MY CHAT COMPLETION JINJA WORK FOR YOUR USAGE ? And what is your back end/front end ?
If you know this shit better than me, feel free to fix it on the e1 or e2 repo.
Thanks!

A new version is on the way with a bigger dataset and a fix for the <think>
I still didn't find a fix for the Jinja issue.

I spoke with some Kobold member and apparently each back end use the chat completion API differently.
For exemple using Chat Completion, In Koboldcpp, <think> isn't prefill at all but is deleted of the current context, where on VLLM, it IS prefill, but nowhere to be seen in the output sent (so the thinking format isn't complete and it's not working properly, visually).

TL;DR: This model work, but to implement it "out of the box" for everyone in those step :

Deleting the thinking out of ctx
Prefilling in the reply

Seem to be hard lmao. I don't see any other way than Chat Completion.

I shared everything needed to make it run.