Model bigger than regular Q4_K_M. What is the difference then ? (GGUF v2.0)
#6
by
Pumba2
- opened
Hi
In your report you say 27b ver is 2gb smaller than QAT "Our dynamic 4bit version is 2GB smaller whilst having +1% extra accuracy vs the QAT version!"
but this UD Q4_K_XL one is actually bigger than a regular Q4_K_M.
Whats the point ?
Pumba2
changed discussion title from
Model bigger than regular Q4_K_M. What is difference then ?
to Model bigger than regular Q4_K_M. What is difference then ? (GGUF v2.0)
Pumba2
changed discussion title from
Model bigger than regular Q4_K_M. What is difference then ? (GGUF v2.0)
to Model bigger than regular Q4_K_M. What is the difference then ? (GGUF v2.0)
If you look in the quantization per matrix (https://huggingface.co/unsloth/gemma-3-12b-it-GGUF/tree/main?show_file_info=gemma-3-12b-it-Q4_K_M.gguf) you'll see it's not Q4_K everywhere but mixes in higher quants for increased quality.
I don't know how the QAT model spends its bits :-D
It's because Gemma 3 has a unique arch so this is why its bigger - but itll be faster to run.