Why are the file sizes so similar?
#1
by
Olafangensan
- opened
Are "smaller" quants faster or something?
Is there a reason for saving what looks like a single gigabyte?
No, I only really released them so people won't ask why they aren't there, the reason they're so similar is that the model does not behave nicely when the FFN (the biggest parts of the model) are quantized to anything besides Q8_0 or MXFP4, and since the FFN dominates the total size of the model, they all end up being basically the same size.
I will update the chart for a bit more clarity on that actually
Olafangensan
changed discussion status to
closed