vLLM serve
#1
by
Marcuswas
- opened
Hello!
Anyone knows the arguments to serve this model with vLLM? I've tried different configurations without success (I'm running it locally on my server):
vllm serve /data/sdia/downloaded_models/Mistral-Small-3.1-24B-Instruct-2503-bnb-4bit --tokenizer-mode mistral --config-format mistral --load-format mistral --gpu-memory-utilization 0.8 --host 0.0.0.0 --port 44851
vllm serve /data/sdia/downloaded_models/Mistral-Small-3.1-24B-Instruct-2503-bnb-4bit --load_format 'bitsandbytes' --quantization 'bitsandbytes' --gpu-memory-utilization 0.8 --host 0.0.0.0 --port 44851
Thanks!