Bug in the tokenizer: Special tokens are not being encoded
#14
by
cassanof
- opened
Hello, I found a bug in the tokenizer, special tokens are not being encoded as such:
>> from transformers import AutoTokenizer
>> tokenizer = AutoTokenizer.from_pretrained("moonshotai/Kimi-K2-Instruct")
>> print(tokenizer.eos_token)
[EOS]
>> print(tokenizer.eos_token_id)
163585
>> print(tokenizer.encode('[EOS]'))
[58, 85521, 60]
This will cause many issues downstream