Model bigger than regular Q4_K_M. What is the difference then ? (GGUF v2.0)

#6
by Pumba2 - opened

Hi

In your report you say 27b ver is 2gb smaller than QAT "Our dynamic 4bit version is 2GB smaller whilst having +1% extra accuracy vs the QAT version!"
but this UD Q4_K_XL one is actually bigger than a regular Q4_K_M.
Whats the point ?

Pumba2 changed discussion title from Model bigger than regular Q4_K_M. What is difference then ? to Model bigger than regular Q4_K_M. What is difference then ? (GGUF v2.0)
Pumba2 changed discussion title from Model bigger than regular Q4_K_M. What is difference then ? (GGUF v2.0) to Model bigger than regular Q4_K_M. What is the difference then ? (GGUF v2.0)

If you look in the quantization per matrix (https://huggingface.co/unsloth/gemma-3-12b-it-GGUF/tree/main?show_file_info=gemma-3-12b-it-Q4_K_M.gguf) you'll see it's not Q4_K everywhere but mixes in higher quants for increased quality.
I don't know how the QAT model spends its bits :-D

Unsloth AI org

It's because Gemma 3 has a unique arch so this is why its bigger - but itll be faster to run.

Sign up or log in to comment