Why 32k context and not as the original model 128k?

#1
by yehiaserag - opened

First of all I'd like to thank you for all you are doing for the open source community.
I noticed the original model has 128k context while this is 32k by default and requires rope to scale.
Also the base model Qwen 2.5 code instruct has 128k context.

Is it possible to provide a 128k context quant?

Actually if you check the original it also has 32k context and requires rope to scale:

https://huggingface.co/all-hands/openhands-lm-32b-v0.1/blob/1ce6c6d98200f19b24e138e64d43481b5ccdf208/config.json#L13

Sorry for that, I was mislead by what they had on the model card text which didn't mention that

Thanks a lot @bartowski

No problem :D

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment