Update tokenizer_config.json
#67
by
erichartford
- opened
add "{% if enable_thinking is defined and enable_thinking is false %}{{'<think>\n\n</think>\n\n'}}{% endif %}"
Add support for empty think block injection in chat template
Description
This PR adds support for the enable_thinking
parameter in the chat template to control chain-of-thought reasoning, achieving feature parity with Qwen3.
Why it's needed
Many inference frameworks (SGLang, vLLM) and applications need to control whether models use reasoning steps. The enable_thinking
parameter provides a standardized way to:
- Improve inference speed when reasoning isn't needed
- Ensure consistent output structure for parsing
- Match behavior across different model families
Usage
# With thinking enabled (default behavior - unchanged)
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True # or omit for default
)
# Output: <|Assistant|>
# With thinking disabled (new behavior)
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False
)
# Output: <|Assistant|><think>\n\n</think>\n\n
Implementation
The change adds a single line to inject an empty think block when enable_thinking=False
:
{% if enable_thinking is defined and enable_thinking is false %}{{'<think>\n\n</think>\n\n'}}{% endif %}
This follows Qwen3's approach where:
enable_thinking=False
strictly disables reasoning by injecting an empty think block- The empty block signals to the model to skip chain-of-thought generation
- Recommended for efficiency-critical scenarios
Backward Compatibility
Fully backward compatible - only affects behavior when enable_thinking=False
is explicitly set.
erichartford
changed pull request status to
closed