Quality benefits of UD-Q4_K_XL vs Q5_K_M vs Q6_K for this model?

#14
by ideosphere - opened

Quality benefits of UD-Q4_K_XL vs Q5_K_M vs Q6_K for this model?

I see your article:

https://unsloth.ai/blog/dynamic-v2

Which has a graph showing the MMLU 5-shot scores for Llama-4-scout vs. quantization format option of the UD dynamic 2.0 quant series
and in that case the Q4K_XL is pretty similar to the Q5_K_M and then subsequently the Q6_K_M and above show more prominently distinct improvement over the Q5_K_M and Q4_K_XL ones.

But the article starts off saying:
"...Model-Specific Quants: Each model now uses a custom-tailored quantization scheme. E.g. the layers quantized in Gemma 3 differ significantly from those in Llama 4."

So given that the quantizations are very model specific is there any guidance (other than obviously downloading several and doing self-evaluations) in estimating the benefit vs. not between
UD-Q4_K_XL vs Q5_K_M vs Q6_K for this model -- perhaps something in metrics that come out of your own quantization / test process scripts that would be useful to post in the model card or whatever for future quantizations of other models?

Also that graph in your blog article says it shows the UD2 quants from IQ1_M to Q8_0 for LLama4 scout. But here we see at the moment only UD-Q4_K_XL (and 2, 3) with the "UD" nomenclature and the rest of the models larger than that like "Q5_K_M" having no "UD" file / folder nomenclature component; does that mean there
are / will be no "UD" quants for Q5/6/8 here?

If UD2 "runs out of benefit" after Q4 it's confusing why
the llama4 scout blog graph shows the 1-8 quants of that model
as UD 2.0 as if it's possibly a generally good thing for all quantization levels.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment