How much of ctx to convert to gguf

#1
by xJohn - opened

Hi,
Thank you for your job. I need to know how much ctx that you convert to gguf. Cause i test the model of gguf, it show error "Requested tokens (885563) exceed context window of 70208".

You can specify the max context size using -c 128000. The error just tells you that the max context size you specified is 70208 but your API call exceeds that. The base model claims it has 128000 but this likely requires you to use sliding window attention and RoPE scaling for good results booth of which are implemented in llama.cpp.

Thank you.

xJohn changed discussion status to closed

Sign up or log in to comment