Model bigger than regular Q4_K_M. What is the difference then ? (GGUF v2.0)

by Pumba2 - opened Apr 25

Apr 25

•

In your report you say 27b ver is 2gb smaller than QAT "Our dynamic 4bit version is 2GB smaller whilst having +1% extra accuracy vs the QAT version!"
but this UD Q4_K_XL one is actually bigger than a regular Q4_K_M.
Whats the point ?

Pumba2 changed discussion title from Model bigger than regular Q4_K_M. What is difference then ? to Model bigger than regular Q4_K_M. What is difference then ? (GGUF v2.0) Apr 25

Pumba2 changed discussion title from Model bigger than regular Q4_K_M. What is difference then ? (GGUF v2.0) to Model bigger than regular Q4_K_M. What is the difference then ? (GGUF v2.0) Apr 25

TobDeBer

Apr 25

If you look in the quantization per matrix (https://huggingface.co/unsloth/gemma-3-12b-it-GGUF/tree/main?show_file_info=gemma-3-12b-it-Q4_K_M.gguf) you'll see it's not Q4_K everywhere but mixes in higher quants for increased quality.
I don't know how the QAT model spends its bits :-D

shimmyshimmer

Unsloth AI org Apr 26

It's because Gemma 3 has a unique arch so this is why its bigger - but itll be faster to run.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment