Why are the file sizes so similar?

by Olafangensan - opened 5 days ago

Discussion

Olafangensan

5 days ago

Are "smaller" quants faster or something?
Is there a reason for saving what looks like a single gigabyte?

bartowski

Owner 5 days ago

No, I only really released them so people won't ask why they aren't there, the reason they're so similar is that the model does not behave nicely when the FFN (the biggest parts of the model) are quantized to anything besides Q8_0 or MXFP4, and since the FFN dominates the total size of the model, they all end up being basically the same size.

I will update the chart for a bit more clarity on that actually

Olafangensan changed discussion status to closed 5 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment