Stuck when run on 8xH100

#8
by Thai - opened

Hi, i run your model on my server (8xH100), but when model load about 11 part, it's stuck and i cannot connect until i reset server, i don't know what happen.
Here my cmd: VLLM_USE_V1=0 VLLM_WORKER_MULTIPROC_METHOD=spawn VLLM_MARLIN_USE_ATOMIC_ADD=1 python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 3000 --max-seq-len-to-capture 65536 --enable-chunked-prefill --enable-prefix-caching --trust-remote-code --tensor-parallel-size 8 --gpu-memory-utilization 0.95 --served-model-name _deep_seek_chat --max-num-batched-tokens 65536 --enable-auto-tool-choice --tool-call-parser llama3_json --max-num-seqs 64 --model DeepSeek-V3-0324-AWQ/

Cognitive Computations org
edited 3 days ago

Are you using it on Vast? Is your NCCL setup correctly? It sounds like a NCCL communication fault to me.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment