VLLM launch command?

by nfunctor - opened Mar 20

Mar 20

Hi, and thanks for the quantisation!

I tried running it with vllm and the suggested version of transformers but got the error ValueError: Mistral3ForConditionalGeneration has no vLLM implementation and the Transformers implementation is not compatible with vLLM. Try setting VLLM_USE_V1=0.

Setting to use V0 engine does not work. VLLM git says that only the mistral format for loading is supported, but adding all the flags like --load-format mistral etc does not seem to work. Could you please help with that?

SpiridonSunRotator

IST Austria Distributed Algorithms and Systems Lab org Mar 20

•

edited Mar 20

@nfunctor Mistral3 is WIP in vllm. One has to wait a bit till they produce a working implementation.

hfmon

Mar 27

mistral3.1 support is official here, at least when using mistral model format, not HF weights. How would one use this quant with vllm? since it's not in mistral format?

Bedovyy

Apr 2

vllm now support Mistral3 HF format.
https://github.com/vllm-project/vllm/commit/51d7c6a2b23e100cd9e7d85b8e7c0eea656b331e

You can use this model after install nightly version.

pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly

SpiridonSunRotator

IST Austria Distributed Algorithms and Systems Lab org Apr 2

@hfmon the chat_template in the original mistral repo was updated. I pushed it here and now one can run inference via HF as well.

hfmon

Apr 2

thanks for the very quick reaction! :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment