Not able to run Model using VLLM

by Pchaudhary - opened Aug 6, 2024

Aug 6, 2024

When I am running the code using VLLM in GPU environment . I am getting below error :

[rank0]: RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+

My GPU configurations are :

GPU Model: Tesla T4
CUDA Compute Capability: 7.5
Total Memory: 15948 MB (approximately 16 GB)
Number of Multiprocessors: 40

Can you provide a workaround on how to run the model ?

mgoin

Red Hat AI org Aug 6, 2024

Hi @Pchaudhary vLLM only supports FP8 models on Ampere (CUDA Compute Capability: 8.0) and up for weight-only, and Ada Lovelace (CUDA Compute Capability: 8.9) and up for weights+activations. We don't have any planned support for older GPUs.

mgoin changed discussion status to closed Aug 6, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment