Does this method degrade quality beyond what direct AutoAWQ would induce?

#2
by Delnith - opened

AutoAWQ doesn't support Gemma 3 properly yet, and neither does GPTQModel for multimodal models. I am curious about this model, which you say was converted to AutoAWQ format from GGUF. I am just curious as to the mechanics of this, does it degrade the quality any more than it otherwise would've if AutoAWQ was used directly (assuming it was supported)?

It should work correctly with vLLM. Last time I checked, AutoAWQ (the library) does not support BF16, hence you can't run Gemma3 correctly without some hacks. I don't know much about multimodal support in AutoAWQ

Btw, I would recommend using the compressed tensors version instead here https://huggingface.co/gaunernst/gemma-3-27b-it-qat-compressed-tensors, since it doesn't have zero points like the original GGUFs

does it degrade the quality any more than it otherwise would've if AutoAWQ was used directly

Can't comment on the accuracy. I guess someone needs to do evals to find out

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment