How much of ctx to convert to gguf

by xJohn - opened 27 days ago

27 days ago

Hi,
Thank you for your job. I need to know how much ctx that you convert to gguf. Cause i test the model of gguf, it show error "Requested tokens (885563) exceed context window of 70208".

nicoboss

27 days ago

You can specify the max context size using -c 128000. The error just tells you that the max context size you specified is 70208 but your API call exceeds that. The base model claims it has 128000 but this likely requires you to use sliding window attention and RoPE scaling for good results booth of which are implemented in llama.cpp.

xJohn

26 days ago

Thank you.

xJohn changed discussion status to closed 26 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment