vLLM serve

#1
by Marcuswas - opened

Hello!

Anyone knows the arguments to serve this model with vLLM? I've tried different configurations without success (I'm running it locally on my server):
vllm serve /data/sdia/downloaded_models/Mistral-Small-3.1-24B-Instruct-2503-bnb-4bit --tokenizer-mode mistral --config-format mistral --load-format mistral --gpu-memory-utilization 0.8 --host 0.0.0.0 --port 44851
vllm serve /data/sdia/downloaded_models/Mistral-Small-3.1-24B-Instruct-2503-bnb-4bit --load_format 'bitsandbytes' --quantization 'bitsandbytes' --gpu-memory-utilization 0.8 --host 0.0.0.0 --port 44851

Thanks!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment