How to run in vLLM

#1
by boshko - opened

Can you please update the instructions on how to run this quantized model in vLLM

Thanks

try using sglang, it has vllm backend

      python3 -m sglang.launch_server
      --served-model-name tonjoo-coder
      --model-path unsloth/Devstral-Small-2505-bnb-4bit
      --chat-template /models/devstral.jinja
      --port 8000
      --host 0.0.0.0
      --mem-fraction-static 0.8

actualy i try tp=2 but not working, using tp=1 might work

Unsloth AI org

You should be able to do it fine vllm serve unsloth/Devstral-Small-2505-bnb-4bit --quantization bitsandbytes --load-format bitsandbytes see https://docs.vllm.ai/en/latest/features/quantization/bnb.html

You should be able to do it fine vllm serve unsloth/Devstral-Small-2505-bnb-4bit --quantization bitsandbytes --load-format bitsandbytes see https://docs.vllm.ai/en/latest/features/quantization/bnb.html

That worked, thanks!

boshko changed discussion status to closed

Sign up or log in to comment