Why are XL quants smaller than M quants?
#11
by
ChuckMcSneed
- opened
Q3_K_M: 112 GB
Q3_K_XL: 104 GB
Q4_K_M: 142 GB
Q4_K_XL: 134 GB
Doesn't look right.
This is normal and it happens sometimes because we reduce some non necessary layer to be lower bit and some to be higher and sometimes this happens
But you can still use the Q4_K_M one which uses our calibration dataset
Which one is better quality?
Which one is better quality?
I'd say they perform equally. The Q4XL one might be tad bit faster
How do they compare to standard K_L quants?