New Chat Template Fixes as of Aug 8, 2025:
In case some of you were wondering:
- Jinja template has extra \n s, didn't parse thinking sections + tool calling wasn't rendered correctly
- Some versions miss <|channel|>final -> this is a must!
- F16 infs: use F32+BF16!
More details in our blog/guide: https://docs.unsloth.ai/basics/gpt-oss
Fine-tuning is also now supported in Unsloth!
There is a little bit of confusion in the guides. The 20B is using --jinja and the 120B isn't? The model is still outputing stuff like "<|channel|>analysis". Why using jinja when it is harmony? I am just wondering about all that mess around the chat template thing while Ollama is working fine. The issue isn't the original template, it is llama.cpp. My 2 cents!
There is a little bit of confusion in the guides. The 20B is using --jinja and the 120B isn't? The model is still outputing stuff like "<|channel|>analysis". Why using jinja when it is harmony? I am just wondering about all that mess around the chat template thing while Ollama is working fine. The issue isn't the original template, it is llama.cpp. My 2 cents!
Many of the issues are irrelevant to llama.cpp
Gonna download F16 right now, will report back if it's better :)
I re-downloaded the model files and used Llama Box, which is based on llama.cpp, but the problem persists. I hope to hear from someone who has tried it successfully.
I tried it with llama.cpp and reasoning set to high and it's giving a lot better answers now, it actually thinks pretty thoroughly unlike before; no matter what I set the reasoning effort or system prompt to, it just only thought for like 3 sentences at a time. Thanks for the hard work @shimmyshimmer !
I tried it with llama.cpp and reasoning set to high and it's giving a lot better answers now, it actually thinks pretty thoroughly unlike before; no matter what I set the reasoning effort or system prompt to, it just only thought for like 3 sentences at a time. Thanks for the hard work @shimmyshimmer !
Amazing to hear thanks for trying the new one out :D