Mistral3ForConditionalGeneration has no vLLM implementation and the Transformers implementation is not compatible with vLLM. Try setting VLLM_USE_V1=0.
π
4
3
#16 opened 5 months ago
by
pedrojfb99
set model_max_length to the maximum length of model context (131072 tokens)
#15 opened 5 months ago
by
x0wllaar

Problem with `mistral3` when loading the model
7
#14 opened 5 months ago
by
r3lativo
Add chat_template to tokenizer_config.json
π
1
1
#11 opened 5 months ago
by
bethrezen

ζ¬ε°ι¨η½²+ζ΅θ―θ§ι’
#9 opened 5 months ago
by
leo009

Can't wait for HF? try chatllm.cpp
π
π
2
6
#7 opened 5 months ago
by
J22
You did it again...
π
39
#4 opened 5 months ago
by
MrDevolver

HF Format?
π§
β€οΈ
33
41
#2 opened 5 months ago
by
bartowski
