Fix chat template in case of multiple assistant messages and no thinking

#9

Previously when messages contained multiple assistant messages, applying tokenizer template with enable_thinking=False would result in applying no thinking tokens to the first assistant message, but applying them to the second assistant message.

For example,

messages = [
    {'role': 'user', 'content': 'i am user 1'},
    {'role': 'assistant', 'content': 'i am assistant 1'},
    {'role': 'user', 'content': 'i am user 2'},
    {'role': 'assistant', 'content': 'i am assistant 2'},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=False,
    truncate=True,
    return_tensors='pt',
    enable_thinking=False

).squeeze(0)

This input_ids would result in the following decoded output:
<|im_start|>user\ni am user 1<|im_end|>\n<|im_start|>assistant\ni am assistant 1<|im_end|>\n<|im_start|>user\ni am user 2<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\ni am assistant 2<|im_end|>\n

fixed template would result in the following decoded output:
<|im_start|>user\ni am user 1<|im_end|>\n<|im_start|>assistant\ni am assistant 1<|im_end|>\n<|im_start|>user\ni am user 2<|im_end|>\n<|im_start|>assistant\ni am assistant 2<|im_end|>\n

I ran into the same issue as well—after manually applying the chat_template from this PR, everything worked correctly. Hope this gets merged soon!

Happy this was useful! In case anyone needs the model that could be easily downloaded with this issue resolved.

https://huggingface.co/VityaVitalich/Qwen3-1.7B

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment