tokenizer
Hi there, we do provide the tokenizers, but not necessarily in other formats, mistral-common is our official implementation of our tokenizers with the accurate tokenization process that we can be sure it will work as expected, it's not that we wont release tokenizers, is that we will release most accurate format of the tokenizer we are confident about. I hope this helps!
Just in case, this is part of the tokenizer: https://huggingface.co/mistralai/Magistral-Small-2506/blob/main/tekken.json
But goes together with mistral-common
the same way the hf tokenizer format goes with the transformers implementation of the tokenizer, the tokenizer is available, but a different format π
That does help clarify. Another posting by another teammate didn't go into that level of detail, but you've cleared it up. Thanks. Am I correct in understanding that when peoples like unsloth upload their models they're basically creating a hf-compatible tokenizer? I noticed that they have the traditional .json files, for example?
Would there possibly be better performance with you guys or what not?
Thanks again.
Does anyone know if the tokenizer is the same used in mistral small 3 2503?
So that we can just copy paste
Edit: I just finetuned and served using Mistral Small 3 2503 tokenizer and it works, I just copy pasted tokenizer.json and special tokens map.