Why 32k context and not as the original model 128k?

by yehiaserag - opened 4 days ago

4 days ago

First of all I'd like to thank you for all you are doing for the open source community.
I noticed the original model has 128k context while this is 32k by default and requires rope to scale.
Also the base model Qwen 2.5 code instruct has 128k context.

Is it possible to provide a 128k context quant?

bartowski

Owner 4 days ago

Actually if you check the original it also has 32k context and requires rope to scale:

https://huggingface.co/all-hands/openhands-lm-32b-v0.1/blob/1ce6c6d98200f19b24e138e64d43481b5ccdf208/config.json#L13

yehiaserag

4 days ago

Sorry for that, I was mislead by what they had on the model card text which didn't mention that

Thanks a lot @bartowski

bartowski

Owner 4 days ago

No problem :D

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment