Error running on A100?

#4
by traphix - opened

command is

vllm serve \
    --served-model-name qwen3-235b-a22b \
    --model /data/model-cache/Qwen3-235B-A22B-FP8-dynamic/ \
    --tensor-parallel-size 4 \
    --trust-remote-code \
    --max-model-len 40960 \
    --gpu-memory-utilization 0.94 \
    --enable-reasoning \
    --reasoning-parser deepseek_r1 \
    --enable-auto-tool-choice \
    --tool-call-parser hermes \
    --disable-log-stats \
    --disable-log-requests \
    --host 0.0.0.0 \
    --port 60522

got error

ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")

Does A100 support MoE FP8-dynamic quantize?

Red Hat AI org

Unfortunately FP8 models are not supported in Ampere architecture chips, like A100s. We are working on a INT8 version of this model.

Unfortunately FP8 models are not supported in Ampere architecture chips, like A100s. We are working on a INT8 version of this model.

Any progress?

Sign up or log in to comment