fixed

#69

by erichartford - opened May 29

base: refs/heads/main

←

from: refs/pr/69

Discussion Files changed

-1

erichartford

May 29

•

edited May 29

add "{% if enable_thinking is defined and enable_thinking is false %}{{'<think>\n\n</think>\n\n'}}{% endif %}"

Add support for empty think block injection in chat template

Description

This PR adds support for the enable_thinking parameter in the chat template to control chain-of-thought reasoning, achieving feature parity with Qwen3.

Why it's needed

Many inference frameworks (SGLang, vLLM) and applications need to control whether models use reasoning steps. The enable_thinking parameter provides a standardized way to:

Improve inference speed when reasoning isn't needed
Ensure consistent output structure for parsing
Match behavior across different model families

Usage

# With thinking enabled (default behavior - unchanged)
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True  # or omit for default
)
# Output: <｜Assistant｜>

# With thinking disabled (new behavior)
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False
)
# Output: <｜Assistant｜><think>\n\n</think>\n\n

Implementation

The change adds a single line to inject an empty think block when enable_thinking=False:

{% if enable_thinking is defined and enable_thinking is false %}{{'<think>\n\n</think>\n\n'}}{% endif %}

This follows Qwen3's approach where:

enable_thinking=False strictly disables reasoning by injecting an empty think block
The empty block signals to the model to skip chain-of-thought generation
Recommended for efficiency-critical scenarios

Backward Compatibility

Fully backward compatible - only affects behavior when enable_thinking=False is explicitly set.

fixed39b5dc48

erichartford

May 29

this version rebased to the latest version of the chat template which was updated under me,
and also fixed some errors I had in copy-paste.

erichartford

May 29

also - maybe huggingface should make chat template be in its own file, instead of in a string in the tokenizer config.

julien-c

May 29

also - maybe huggingface should make chat template be in its own file, instead of in a string in the tokenizer config.

Actually that's the case in recent transformers! File is chat_template.jinja now. It's a real Jinja file so you get syntax highlighting, formatting etc.

cc @Rocketknight1 who's been leading this change.

Rocketknight1

May 30

Yes! The template key in the tokenizer config can be replaced with a plaintext chat_template.jinjafile. It's the default save format now - if you load and resave a tokenizer that's what you'll get in future. We expect model repos will transition over time, but we'll keep supporting the legacy format for loading for the forseeable future.

erichartford

May 30

Wow, thanks! <3

erichartford

May 31

•

edited May 31

python test.py                                         (base)  20.063s  
✓ Connected to vLLM server
✓ Available models: ['/models/DeepSeek-R1-0528']

================================================================================
Testing DeepSeek-R1 enable_thinking parameter via vLLM API
Server: http://localhost:8001/v1
Model: /models/DeepSeek-R1-0528
================================================================================

[Test 1] enable_thinking not defined (default)
------------------------------------------------------------
Response preview: <think>
I need to calculate 25 times 17. I'll show my work step by step. I know multiplication is like repeated addition, so 25 times 17 means adding ...

✓ Contains <think>: True
✓ PASSED: Model generates <think> tags as expected


[Test 2] enable_thinking = True
------------------------------------------------------------
Response preview: <think>
I need to calculate 25 times 17. I should show my work, so I'll think step by step. I know multiplication is like repeated addition, so 25 tim...

✓ Contains <think>: True
✓ PASSED: Model generates <think> tags as expected


[Test 3] enable_thinking = False
------------------------------------------------------------
Response preview: I need to multiply 25 by 17 and show my work. I recall that multiplication is like repeated addition, so 25 times 17 means adding 25 seventeen times. ...

✓ Contains <think>: False
✓ PASSED: Model skips <think> tags as expected

================================================================================
TEST SUMMARY
================================================================================
enable_thinking undefined : PASSED (should have <think>)
enable_thinking=True      : PASSED (should have <think>)
enable_thinking=False     : PASSED (should NOT have <think>)

✓ ALL TESTS PASSED!

But take note! During the course of this testing, I discovered another bug in the chat template.

{% if add_generation_prompt and not ns.is_last_user and not ns.is_tool %}
should instead be
{% if add_generation_prompt and ns.is_last_user and not ns.is_tool %}

I submitted that fix here: https://huggingface.co/deepseek-ai/DeepSeek-R1-0528/discussions/80

Thanks to Hot Aisle for lending me the excelent mi300x node used to validate this.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment