Why quants > q4?

by tarruda - opened 3 days ago

3 days ago

From my understanding, google re-trained gemma3 while 4-bit quantized to produce the qat versions, so I'm curious why this repo was quants > q4. Will these versions produce better results than q4?

Thanks for your amazing work

mlabonne

Owner 1 day ago

You're correct, I was tempted to do Q4-only, but I thought it could be nice to see how the model behaves at different precisions. Curious to see the results if anyone tries other quants :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment