How many bits of Quantization is enough for Code Generation Tasks?

#5
by luweigen - opened

Question is not really correct.
Correct is - how low to be and avoid hallucinations in model produce, large models cannot be used in low quants like smaller 32-70B etc. That was obvious during first open large model Falcon 180B, which was unusable at low quants, it was horrible. Next was Meta 405B, same and even slower x2 than deepseek (they really improved speed). Unfortunately Q6 is ideal, because Q8 require additionally +100Gb RAM. Q5 is kinda good too, but i've registered difference in Q5 vs Q6-latter is slightly better and mostly never hallucinate. I've tested Q6 already, it uses still same 567Gb RAM/VRAM locally.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment