Why 32k context and not as the original model 128k?
#1
by
yehiaserag
- opened
First of all I'd like to thank you for all you are doing for the open source community.
I noticed the original model has 128k context while this is 32k by default and requires rope to scale.
Also the base model Qwen 2.5 code instruct has 128k context.
Is it possible to provide a 128k context quant?
Actually if you check the original it also has 32k context and requires rope to scale:
Sorry for that, I was mislead by what they had on the model card text which didn't mention that
Thanks a lot @bartowski
No problem :D