Some initial results comparing size and perplexity
I cooked some quants from this using ik_llama.cpp
and wrote up my initial observations: https://github.com/ikawrakow/ik_llama.cpp/discussions/334
Two ik_llama.cpp
fork quants available here: https://huggingface.co/ubergarm/gemma-3-27b-it-qat-GGUF
Hi, Awesome work, Thank you so much for taking the time to create these GGUF quants for the new Gemma 3 27B model. and Providing GGUF quantizations for Gemma 3 27B so quickly helps developers and researchers everywhere experiment with the model on consumer hardware.
Thanks! Are you working for google? Yeah ik's SOTA quants are quite impressive. Its been a fun ride cooking so many as they become available. If you're unfamiliar with ik's work, I suggest you watch this recent 2025 FOSDEM talk with IK for some more background of the scene.
The new iqN_kt
QTIP/Trells quants (similar to EXL3) are also quite promising for inferencing on CUDA backend. I haven't gone back to try an IQ4_KT which could work quite well with your Gemm 3 27B QAT given its 4.0BPW target size.
Cheers and curious to see what y'all open weights next!