VLLM, SGLANG
has anyone been able to deploy this AWQ with either SGLANG or VLLM? If so please provide the ver / PR etc. Thank you
vllm works fine with good quality. Recommend to use.
vllm works fine with good quality. Recommend to use.
which version of vllm works fine for you ?
how fast is this running with 48gb of vram on vllm/sglang?
I got this to work with vllm 0.8.5 on 8x A10 cards. It required providing a fused_moe config file at "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_A10,dtype=int4_w4a16.json" as there are only configs for H series cards and the default settings cannot work without a config. This config file can be mostly generated using the benchmark_moe.py script from the vllm project.