Why Am I Getting an Out-Of-Memory Error with My GPU Specs?
#7
by
chunjae
- opened
hi, I have a setup with four A100 GPUs, each with 40GB of VRAM.
I believe this hardware should be sufficient to load the Llama-4 quantized model (listed as 57.4B on the homepage), but I'm running into CUDA Out of Memory errors.
Could someone please explain why this might be happening?
I am running llama 4 with vllm by following the official website command https://blog.vllm.ai/2025/04/05/llama4.html. However, it still error with this model.
Can provide some advice for me? Thank you.
I have a setup with 2 v100 * 80GB.