VLLM Reasoning parser
Hello,
As of VLLM v0.9.1, the available reasoning parsers are the following: deepseek_r1,granite,qwen3
None of them seems to correctly parse a reasoning made by mistralai/Magistral-Small-2506. Consequently, the reasoning ends up in the content
field instead of the reasoning_content
field.
Is this a work in progress on VLLM or a misconfiguration on my side ? I don't see any Mistral parser on vllm source code: https://github.com/vllm-project/vllm/blob/main/vllm/reasoning/__init__.py
With qwen3 I'm getting: RuntimeError: Qwen3 reasoning parser could not locate think start/end tokens in the tokenizer!
With deepseek_r1: RuntimeError: DeepSeek R1 reasoning parser could not locate think start/end tokens in the tokenizer!
With granite or with no rezasoning parser set: VLLM container starts but the reasoning content is not parsed onto a separate field
Here's how I'm starting the container:
podman run --name mistralai-Magistral-Small-2506 \
-e VLLM_USE_V1=1 \
-e VLLM_TARGET_DEVICE=cuda \
-d vllm/vllm-openai:v0.9.1 \
--model mistralai/Magistral-Small-2506 --revision 48c97929837c3189cb3cf74b1b5bc5824eef5fcc \
--enable-prefix-caching --generation-config vllm \
--guided-decoding-backend xgrammar \
--gpu-memory-utilization 0.89 --max-model-len 31999 --max-num-batched-tokens 32000 \
--max-num-seqs 128 \
--enable-reasoning --reasoning-parser granite \
--tool-call-parser mistral --chat-template /root/chat_templates/chat_template_magistral.jinja \
--enable-auto-tool-choice --tokenizer_mode mistral --load_format mistral --config_format mistral
Hi,
It is not an error on your side. AFAIK supported parsers by vLLM look for special tokens that have the string representations <think>
and </think>
. Magistral do not use special tokens to start thinking. The token ids to represent <think>
and </think>
can be various and depend on the context. For example <think>
could be encoded as ["<think", ">\n"].
For now, we don't plan to add a parser but it might change in the future.