Not able to run Model using VLLM
#3
by
Pchaudhary
- opened
When I am running the code using VLLM in GPU environment . I am getting below error :
[rank0]: RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+
My GPU configurations are :
GPU Model: Tesla T4
CUDA Compute Capability: 7.5
Total Memory: 15948 MB (approximately 16 GB)
Number of Multiprocessors: 40
Can you provide a workaround on how to run the model ?
Hi @Pchaudhary vLLM only supports FP8 models on Ampere (CUDA Compute Capability: 8.0) and up for weight-only, and Ada Lovelace (CUDA Compute Capability: 8.9) and up for weights+activations. We don't have any planned support for older GPUs.
mgoin
changed discussion status to
closed