The UD-IQ2_XXS is surprisingly good, but it's good to know that it degrades gradually but significantly after about 1000 tokens.
#9
by
mmbela
- opened
I mean the usage of the context window (prompt+answer). It's clearly visible, especially when asked to translate. But what he does with such few bits is still brilliant. It would be good to know whether this is a general characteristic of the method, of course in every quants at its own level.
I mean the usage of the context window (prompt+answer). It's clearly visible, especially when asked to translate. But what he does with such few bits is still brilliant. It would be good to know whether this is a general characteristic of the method, of course in every quants at its own level.
Interesting glad you found it useful. I think as context goes on it might be slower and some performance degradation yes, but standard with other LLMs too