Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
davidberenstein1957 
posted an update 1 day ago

Maybe not with the trailing spaces?

·

Thanks. It is intended though: https://arxiv.org/pdf/2501.12948. As outlined on the chat template format in the paper.

You should be using <|begin▁of▁sentence|><|User|> instead of <|begin▁of▁sentence|>User:

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")
chat = [
  {"role": "user", "content": "EXAMPLE USER TURN 1"},
  {"role": "assistant", "content": "EXAMPLE MODEL TURN 1"},
  {"role": "user", "content": "EXAMPLE USER TURN 2"},
]

print(tokenizer.apply_chat_template(chat, add_generation_prompt=True, tokenize=False))
# <|begin▁of▁sentence|><|User|>EXAMPLE USER TURN 1<|Assistant|>EXAMPLE MODEL TURN 1<|end▁of▁sentence|><|User|>EXAMPLE USER TURN 2<|Assistant|>
·

it's looks like mistral v3 tekken format