Fast Tokenization for Phi3-small
#29
by
mfajcik
- opened
Dear authors,
I was wondering why this model only provides basic tokenizer and not fast tokenizer.
>>> y = AutoTokenizer.from_pretrained("microsoft/Phi-3-small-128k-instruct", trust_remote_code=True, use_fast=True)
>>>y.is_fast
False
Unfortunately, this makes model unusable in some cases, which require token offset_mapping, for reversible tokenization (as is the case of my research currently).
Is this itentional? Different phi tokenizers are fast, e.g., "microsoft/Phi-3-mini-4k-instruct"
.
Thank you for any advice.
Best,
Martin
mfajcik
changed discussion title from
Fast Tokenization for Phi
to Fast Tokenization for Phi3-small