fixed
add "{% if enable_thinking is defined and enable_thinking is false %}{{'<think>\n\n</think>\n\n'}}{% endif %}"
Add support for empty think block injection in chat template
Description
This PR adds support for the enable_thinking
parameter in the chat template to control chain-of-thought reasoning, achieving feature parity with Qwen3.
Why it's needed
Many inference frameworks (SGLang, vLLM) and applications need to control whether models use reasoning steps. The enable_thinking
parameter provides a standardized way to:
- Improve inference speed when reasoning isn't needed
- Ensure consistent output structure for parsing
- Match behavior across different model families
Usage
# With thinking enabled (default behavior - unchanged)
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True # or omit for default
)
# Output: <|Assistant|>
# With thinking disabled (new behavior)
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False
)
# Output: <|Assistant|><think>\n\n</think>\n\n
Implementation
The change adds a single line to inject an empty think block when enable_thinking=False
:
{% if enable_thinking is defined and enable_thinking is false %}{{'<think>\n\n</think>\n\n'}}{% endif %}
This follows Qwen3's approach where:
enable_thinking=False
strictly disables reasoning by injecting an empty think block- The empty block signals to the model to skip chain-of-thought generation
- Recommended for efficiency-critical scenarios
Backward Compatibility
Fully backward compatible - only affects behavior when enable_thinking=False
is explicitly set.
this version rebased to the latest version of the chat template which was updated under me,
and also fixed some errors I had in copy-paste.
also - maybe huggingface should make chat template be in its own file, instead of in a string in the tokenizer config.
also - maybe huggingface should make chat template be in its own file, instead of in a string in the tokenizer config.
Actually that's the case in recent transformers! File is chat_template.jinja
now. It's a real Jinja file so you get syntax highlighting, formatting etc.
cc @Rocketknight1 who's been leading this change.
Yes! The template key in the tokenizer config can be replaced with a plaintext chat_template.jinja
file. It's the default save format now - if you load and resave a tokenizer that's what you'll get in future. We expect model repos will transition over time, but we'll keep supporting the legacy format for loading for the forseeable future.
Wow, thanks! <3
python test.py (base) 20.063s
✓ Connected to vLLM server
✓ Available models: ['/models/DeepSeek-R1-0528']
================================================================================
Testing DeepSeek-R1 enable_thinking parameter via vLLM API
Server: http://localhost:8001/v1
Model: /models/DeepSeek-R1-0528
================================================================================
[Test 1] enable_thinking not defined (default)
------------------------------------------------------------
Response preview: <think>
I need to calculate 25 times 17. I'll show my work step by step. I know multiplication is like repeated addition, so 25 times 17 means adding ...
✓ Contains <think>: True
✓ PASSED: Model generates <think> tags as expected
[Test 2] enable_thinking = True
------------------------------------------------------------
Response preview: <think>
I need to calculate 25 times 17. I should show my work, so I'll think step by step. I know multiplication is like repeated addition, so 25 tim...
✓ Contains <think>: True
✓ PASSED: Model generates <think> tags as expected
[Test 3] enable_thinking = False
------------------------------------------------------------
Response preview: I need to multiply 25 by 17 and show my work. I recall that multiplication is like repeated addition, so 25 times 17 means adding 25 seventeen times. ...
✓ Contains <think>: False
✓ PASSED: Model skips <think> tags as expected
================================================================================
TEST SUMMARY
================================================================================
enable_thinking undefined : PASSED (should have <think>)
enable_thinking=True : PASSED (should have <think>)
enable_thinking=False : PASSED (should NOT have <think>)
✓ ALL TESTS PASSED!
But take note! During the course of this testing, I discovered another bug in the chat template.
{% if add_generation_prompt and not ns.is_last_user and not ns.is_tool %}
should instead be{% if add_generation_prompt and ns.is_last_user and not ns.is_tool %}
I submitted that fix here: https://huggingface.co/deepseek-ai/DeepSeek-R1-0528/discussions/80
Thanks to Hot Aisle for lending me the excelent mi300x node used to validate this.