Regarding max_position_embeddings
#1
by
hanshounsu
- opened
Hello, thank you for the great work. I have a question regarding max_position_embeddings.
You limited the value to 32768 while weight-porting, but I don't see any reason why it won't work at larger value (131072), because as far as I know they use rotary embedding which are not trainable, therefore if configured identically, we can increase the value freely.
Thanks again for the great work!
Sorry for the late reply.
The reason why I reduce the max_position_embeddings
value is that it gives OOM. Altho I am not really sure myself, but I think it's because the JAX jit has to compile for the first time when we load the model. And the value that HuggingFace uses for the first initialization on the sequence dimension is the max_position_embeddings