Setting model_max_length in tokenizer appears to have no effect
#14
by
fCola
- opened
Hi and thanks for the great model!
It seems that the settings for using long context embeddings (in transformers) are not working. Doing this:
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased',
model_max_length=8192)
print(tokenizer.model_max_length)
outputs 512. I also tested that embedding two texts that are equal up to the 512-th token produces the same embedding. However, it works fine with the sentence transformer example. Am I doing something wrong? Thanks!
hm does it work if you do the following?
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
tokenizer.model_max_length = 8192
Yes, setting it this way makes it work in Transformers too! Should I close this?
yes thanks for bearing with us!
zpn
changed discussion status to
closed