The UD-IQ2_XXS is surprisingly good, but it's good to know that it degrades gradually but significantly after about 1000 tokens.

by mmbela - opened Mar 30

Mar 30

I mean the usage of the context window (prompt+answer). It's clearly visible, especially when asked to translate. But what he does with such few bits is still brilliant. It would be good to know whether this is a general characteristic of the method, of course in every quants at its own level.

shimmyshimmer

Unsloth AI org Mar 31

I mean the usage of the context window (prompt+answer). It's clearly visible, especially when asked to translate. But what he does with such few bits is still brilliant. It would be good to know whether this is a general characteristic of the method, of course in every quants at its own level.

Interesting glad you found it useful. I think as context goes on it might be slower and some performance degradation yes, but standard with other LLMs too

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment