VLLM launch command?
Hi, and thanks for the quantisation!
I tried running it with vllm and the suggested version of transformers but got the error ValueError: Mistral3ForConditionalGeneration has no vLLM implementation and the Transformers implementation is not compatible with vLLM. Try setting VLLM_USE_V1=0.
Setting to use V0
engine does not work. VLLM git says that only the mistral format for loading is supported, but adding all the flags like --load-format mistral
etc does not seem to work. Could you please help with that?
@nfunctor
Mistral3 is WIP in vllm
. One has to wait a bit till they produce a working implementation.
mistral3.1 support is official here, at least when using mistral model format, not HF weights. How would one use this quant with vllm? since it's not in mistral format?
vllm now support Mistral3 HF format.
https://github.com/vllm-project/vllm/commit/51d7c6a2b23e100cd9e7d85b8e7c0eea656b331e
You can use this model after install nightly version.
pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
@hfmon
the chat_template
in the original mistral repo was updated. I pushed it here and now one can run inference via HF as well.
thanks for the very quick reaction! :)