Error running on A100?
#4
by
traphix
- opened
command is
vllm serve \
--served-model-name qwen3-235b-a22b \
--model /data/model-cache/Qwen3-235B-A22B-FP8-dynamic/ \
--tensor-parallel-size 4 \
--trust-remote-code \
--max-model-len 40960 \
--gpu-memory-utilization 0.94 \
--enable-reasoning \
--reasoning-parser deepseek_r1 \
--enable-auto-tool-choice \
--tool-call-parser hermes \
--disable-log-stats \
--disable-log-requests \
--host 0.0.0.0 \
--port 60522
got error
ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
Does A100 support MoE FP8-dynamic quantize?
Unfortunately FP8 models are not supported in Ampere architecture chips, like A100s. We are working on a INT8 version of this model.
Unfortunately FP8 models are not supported in Ampere architecture chips, like A100s. We are working on a INT8 version of this model.
Any progress?