VLLM launch command?

#1
by nfunctor - opened

Hi, and thanks for the quantisation!

I tried running it with vllm and the suggested version of transformers but got the error ValueError: Mistral3ForConditionalGeneration has no vLLM implementation and the Transformers implementation is not compatible with vLLM. Try setting VLLM_USE_V1=0.

Setting to use V0 engine does not work. VLLM git says that only the mistral format for loading is supported, but adding all the flags like --load-format mistral etc does not seem to work. Could you please help with that?

IST Austria Distributed Algorithms and Systems Lab org
edited Mar 20

@nfunctor Mistral3 is WIP in vllm. One has to wait a bit till they produce a working implementation.

mistral3.1 support is official here, at least when using mistral model format, not HF weights. How would one use this quant with vllm? since it's not in mistral format?

vllm now support Mistral3 HF format.
https://github.com/vllm-project/vllm/commit/51d7c6a2b23e100cd9e7d85b8e7c0eea656b331e

You can use this model after install nightly version.

pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
IST Austria Distributed Algorithms and Systems Lab org

@hfmon the chat_template in the original mistral repo was updated. I pushed it here and now one can run inference via HF as well.

thanks for the very quick reaction! :)

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment