Question about 1-bit quant
#2
by
ThomasBaruzier
- opened
Hello,
You are claiming your 1 bit quant is "custom".
Could you please elaborate about how it was made, and if it is higher quality than a traditional IQ1_S or IQ1_M quant?
Thanks.
only ~92% of the weights are 1bit,
so had to rewrite llama.cpp to do that custom quant,
also have not uploaded them yet
Thank you for the answer
If you plot the model size vs PPL for the two closest quants, would this custom quant yield a lower, equal, or higher perplexity? If there is a real benefit, it might be worth sharing your findings in the llama.cpp repo?