Error when running pipe: temp_state buffer is too small
#35
by
StefanStroescu
- opened
Hello,
I am trying to use the model to generate me an answer from a context I provide, but when I get to text generation, I get this error: temp_state buffer is too small.
I think it is because my prompt is quite large in terms of tokens, because when I prompt the model without context it works.
I checked and is not an issue of resources, GPU or RAM, and the Llama-2-13B-chat-GPTQ worked when prompted with context.
Does anyone have any suggestions on how to solve this?
Thanks,
Thanks, Komposter43,
I don't know if it has anything to do with this (https://huggingface.co/TheBloke/Llama-2-70B-chat-GPTQ/discussions/29), but I noticed that the model accepts only inference requests under 2048 tokens.