Does this method degrade quality beyond what direct AutoAWQ would induce?

by Delnith - opened Apr 27

Apr 27

AutoAWQ doesn't support Gemma 3 properly yet, and neither does GPTQModel for multimodal models. I am curious about this model, which you say was converted to AutoAWQ format from GGUF. I am just curious as to the mechanics of this, does it degrade the quality any more than it otherwise would've if AutoAWQ was used directly (assuming it was supported)?

gaunernst

Owner Apr 27

It should work correctly with vLLM. Last time I checked, AutoAWQ (the library) does not support BF16, hence you can't run Gemma3 correctly without some hacks. I don't know much about multimodal support in AutoAWQ

Btw, I would recommend using the compressed tensors version instead here https://huggingface.co/gaunernst/gemma-3-27b-it-qat-compressed-tensors, since it doesn't have zero points like the original GGUFs

gaunernst

Owner Apr 27

does it degrade the quality any more than it otherwise would've if AutoAWQ was used directly

Can't comment on the accuracy. I guess someone needs to do evals to find out

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment