Wrong context length limit for HuggingChat?

#68
by vesjolovam - opened

Whenever the context length reaches 4096, the model stops generating mid-response, and if you try to send new prompts, the following error appears, rejecting the prompt (can be any value instead of 4107, depending on the prompt size):

Input validation error: inputs tokens + max_new_tokens must be <= 4096. Given: 4107 inputs tokens and 0 max_new_tokens

Google org

Hi @vesjolovam ,

Using HuggingChat, the model can only handle a total of 4096 tokens which means input plus the response. If input alone crosses that limit, the model won't work and gives an error. To fix it, either shorten the input, reduce max_new_tokens, or switch to a model that supports longer inputs.

Thank you.

Hello, @GopiUppari , thank you for responding.

But why such a limitation? I've thought it surely must be a bug in the setup, since that's a very tiny context length for a modern model — no other model on HuggingChat seems to have similar limits: even a regular conversation without any long text being sent with the prompt would exceed the context in just about 5-8 turns — and if some longer content (e.g. a relatively low-sized codebase or some text document) is sent, chances are that it wouldn't fit in the context at all, not even for a single turn.

Or is it just intended as a tiny demo with no serious capabilities? It's just that in the AI Studio the length of just a single response is capped at 8192, while here the whole context is capped at half of that — just ≈3% of the model's actual capabilities. Or is it not the original model here, but some converted version with some conversion issue, limiting the context severely, like seems to be reported with many other converted versions of Gemma 3?

Sign up or log in to comment