Some initial results comparing size and perplexity

#1
by ubergarm - opened

I cooked some quants from this using ik_llama.cpp and wrote up my initial observations: https://github.com/ikawrakow/ik_llama.cpp/discussions/334

Two ik_llama.cpp fork quants available here: https://huggingface.co/ubergarm/gemma-3-27b-it-qat-GGUF

Google org

Hi, Awesome work, Thank you so much for taking the time to create these GGUF quants for the new Gemma 3 27B model. and Providing GGUF quantizations for Gemma 3 27B so quickly helps developers and researchers everywhere experiment with the model on consumer hardware.

@lkv

Thanks! Are you working for google? Yeah ik's SOTA quants are quite impressive. Its been a fun ride cooking so many as they become available. If you're unfamiliar with ik's work, I suggest you watch this recent 2025 FOSDEM talk with IK for some more background of the scene.

The new iqN_kt QTIP/Trells quants (similar to EXL3) are also quite promising for inferencing on CUDA backend. I haven't gone back to try an IQ4_KT which could work quite well with your Gemm 3 27B QAT given its 4.0BPW target size.

Cheers and curious to see what y'all open weights next!

Sign up or log in to comment