Why quants > q4?

#1
by tarruda - opened

From my understanding, google re-trained gemma3 while 4-bit quantized to produce the qat versions, so I'm curious why this repo was quants > q4. Will these versions produce better results than q4?

Thanks for your amazing work

You're correct, I was tempted to do Q4-only, but I thought it could be nice to see how the model behaves at different precisions. Curious to see the results if anyone tries other quants :)

Sign up or log in to comment