vllm crach with a slightly longer prompt
#4
by
rockcat-miao
- opened
my startup commands:
docker run --gpus all -e VLLM_USE_V1=0 -e VLLM_WORKER_MULTIPROC_METHOD=spawn -e VLLM_MARLIN_USE_ATOMIC_ADD=1 --shm-size 64g --rm -p 8000:8000 -v /DATA/disk0/models:/data/models vllm/vllm-openai:v0.8.1 --model /data/models/DeepSeek-V3-0324-AWQ --tensor-parallel-size 8 --enable-auto-tool-choice --tool-call-parser hermes --served-model-name deepseek-v3 --trust-remote-code --max-model-len 65536 --max-seq-len-to-capture 65536 --enable-chunked-prefill --enable-prefix-caching --gpu-memory-utilization 0.95
only worrking well for simple prompt like “hello, how are you”...
GPU A100 * 8
v2ray
changed discussion status to
closed