Update tokenizer_config.json

#67

add "{% if enable_thinking is defined and enable_thinking is false %}{{'<think>\n\n</think>\n\n'}}{% endif %}"

Add support for empty think block injection in chat template

Description

This PR adds support for the enable_thinking parameter in the chat template to control chain-of-thought reasoning, achieving feature parity with Qwen3.

Why it's needed

Many inference frameworks (SGLang, vLLM) and applications need to control whether models use reasoning steps. The enable_thinking parameter provides a standardized way to:

  • Improve inference speed when reasoning isn't needed
  • Ensure consistent output structure for parsing
  • Match behavior across different model families

Usage

# With thinking enabled (default behavior - unchanged)
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True  # or omit for default
)
# Output: <|Assistant|>

# With thinking disabled (new behavior)
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False
)
# Output: <|Assistant|><think>\n\n</think>\n\n

Implementation

The change adds a single line to inject an empty think block when enable_thinking=False:

{% if enable_thinking is defined and enable_thinking is false %}{{'<think>\n\n</think>\n\n'}}{% endif %}

This follows Qwen3's approach where:

  • enable_thinking=False strictly disables reasoning by injecting an empty think block
  • The empty block signals to the model to skip chain-of-thought generation
  • Recommended for efficiency-critical scenarios

Backward Compatibility

Fully backward compatible - only affects behavior when enable_thinking=False is explicitly set.

erichartford changed pull request status to closed

Sign up or log in to comment