Model running badly on vLLM
#2
by
meiragat
- opened
Hi,
I'm running the model and getting really slow and bad results.
Am I doing something wrong, should I use a different version for running the model on my 24 / 48 GB GPUs?
vLLM docker command:
--model gaunernst/gemma-3-27b-it-int4-awq --served-model-name gaunernst/gemma-3-27b-it-int4-awq --quantization awq_marlin --gpu-memory-utilization 0.90 --tensor-parallel-size 1 --max-model-len 20072 --swap-space 16 --trust-remote-code --enable-chunked-prefill
What's ur GPU? Anway, I would recommend this checkpoint instead https://huggingface.co/gaunernst/gemma-3-27b-it-qat-autoawq
Thanks for responding, the other model didn't seem to be much better for me, posting my results there.
meiragat
changed discussion status to
closed