Why quants > q4?
#1
by
tarruda
- opened
From my understanding, google re-trained gemma3 while 4-bit quantized to produce the qat versions, so I'm curious why this repo was quants > q4. Will these versions produce better results than q4?
Thanks for your amazing work
You're correct, I was tempted to do Q4-only, but I thought it could be nice to see how the model behaves at different precisions. Curious to see the results if anyone tries other quants :)