Tokenizer or template bug
Q4_K_XL quant of MedGemma-27-text-it-GUFF In llama.cpp
initial response begins:
<unused94>thought
Here's a breakdown of the thinking process...
Q4_K_XL quant of MedGemma-27-text-it-GUFF In llama.cpp
initial response begins:
<unused94>thought
Here's a breakdown of the thinking process...
Hi there I tried it in llama.cpp and the error doesn't occur. Do you know if it's specifically for the Q4 XL quant?
Yes 100% Unsloth Q4 XL
Llama.cpp b5423. Didn't modify sampler settings. Occurred very first attempt to use model so assumed it would be easy to reproduce. Prompt was something like "describe all medications that can be used to treat X".
I am also experiencing this issue using Q8_K_XL with ollama. Not sure how to fix or if it is just a weird quirk of the model itself. This model feels like they did SFT over reasoning traces tbh.