Tokenizer or template bug

#1
by beijinghouse - opened

Q4_K_XL quant of MedGemma-27-text-it-GUFF In llama.cpp

initial response begins:

<unused94>thought
Here's a breakdown of the thinking process...

Unsloth AI org

Q4_K_XL quant of MedGemma-27-text-it-GUFF In llama.cpp

initial response begins:

<unused94>thought
Here's a breakdown of the thinking process...

Hi there I tried it in llama.cpp and the error doesn't occur. Do you know if it's specifically for the Q4 XL quant?

Yes 100% Unsloth Q4 XL

Llama.cpp b5423. Didn't modify sampler settings. Occurred very first attempt to use model so assumed it would be easy to reproduce. Prompt was something like "describe all medications that can be used to treat X".

I am also experiencing this issue using Q8_K_XL with ollama. Not sure how to fix or if it is just a weird quirk of the model itself. This model feels like they did SFT over reasoning traces tbh.

Sign up or log in to comment