No IQ2_XSS on purpose?

#5
by Kwissbeats - opened

Hello, sorry the bother you. I really appreciate your work!

Since I was pleasantly surprised how good the qwq quant was I wonder
if a IQ2_XSS version on gemma is or would be less successful?

gr

Yeah it was a conscious decision, have to put he cutoff somewhere 😅

What kind of card are you attempting to fit it on where 8.44GB is too big?

How much smaller could the IQ2_XSS be? If a 4060 with 8GB could run a Gemma 27B quant that might be interesting to someone but my guess is that IQ2_XSS would come in at ~8.1 GB or something anyway.

Yeah it was a conscious decision, have to put he cutoff somewhere 😅
What kind of card are you attempting to fit it on where 8.44GB is too big?

I understand thnx for replying :)

I rather not tell but since you asked😅, I am currently running qwq with full context partly on the cpu and on nvidia 1060 with 8gb of memory.
Most of the the time I even reach for q4_m.

Complex coding tasks can take a while, But it mainly fixes my python/JavaScript syntax and indentation errors.

ps. in no shape or form this a request, (well it was, but just interest in why)
as nkelly said the size reduction will be small anyway.

thnx

hmm fair fair, yeah if it were me at that point i'd probably sacrifice some speed to make it only a partial offload and reach for a higher quality, 27B@IQ2_XXS is a lot of loss sadly.. and like @nkelly13 mentioned would probably still not fit entirely on your GPU anyways

Sign up or log in to comment