Fix chat template in case of multiple assistant messages and no thinking
Previously when messages contained multiple assistant messages, applying tokenizer template with enable_thinking=False
would result in applying no thinking tokens to the first assistant message, but applying them to the second assistant message.
For example,
messages = [
{'role': 'user', 'content': 'i am user 1'},
{'role': 'assistant', 'content': 'i am assistant 1'},
{'role': 'user', 'content': 'i am user 2'},
{'role': 'assistant', 'content': 'i am assistant 2'},
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=False,
truncate=True,
return_tensors='pt',
enable_thinking=False
).squeeze(0)
This input_ids would result in the following decoded output:<|im_start|>user\ni am user 1<|im_end|>\n<|im_start|>assistant\ni am assistant 1<|im_end|>\n<|im_start|>user\ni am user 2<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\ni am assistant 2<|im_end|>\n
fixed template would result in the following decoded output:<|im_start|>user\ni am user 1<|im_end|>\n<|im_start|>assistant\ni am assistant 1<|im_end|>\n<|im_start|>user\ni am user 2<|im_end|>\n<|im_start|>assistant\ni am assistant 2<|im_end|>\n
I ran into the same issue as well—after manually applying the chat_template from this PR, everything worked correctly. Hope this gets merged soon!
Happy this was useful! In case anyone needs the model that could be easily downloaded with this issue resolved.