Could you help me quantize?

#23
by Alsebay - opened

https://huggingface.co/Alsebay/Narumashi-RT-11B-test
This is my model, but when I quantized it by using llama.cpp quantization, I needed to use --vocab-type hfft to continue processing fp16 GGUF version. Then the result of Q4_K_M cause huge difference in performance. Thank you very much.

Hi! First, your repo has two models (pytorech and safetensors), and should only have one. That in itself should not have caused any vocabulary issues.

If convert.py works with hfft and the resulting model works correctly, then the conversion is almost certainly successful. If with performance issue you mean speed, then thats probably a separate issue. If you mean the Q4_K_M performs much worse than the fp16, quality-wise, then thats not expected, but quants do lose quality, and some models are more sensitive than others to this. And since LLMs are not deterministic, it's often hard to decide whether performance is really worse or the result is just random.

If you can delete the wrong set of tensors from your repo, I can put the model into my queue and make quants. The outcome will likely be the same as your own Q4_K_M, though, as I'm also not doing anything special - but I can easily make more quants, and alsoo imatrix quants if you wish.

I see, thanks. I will check again if it is random bad result or not. You help me a lots. :)

I have known the issue, that cause by some error when saved model from pretrain. Thank you for spending your time to help me. I will delete it and close this Discussion. Hope that I could finish this model soon. Have a good day.

Alsebay changed discussion status to closed

Good luck! BTW, I have quantised quite a few of your models, you are quite productive :)

Oh thanks XD. Just some experiments to learn LLM fine-tuning and merging. Then I found that I can't use all them so that I upload to Huggingface for anyone who need. :)

That's why I started to upload quants - the bloke didn't do them, and doing them just for me seemed wasted effort. It's really nice to see so many people work together here.

Me too, it's my pleasure to meet people like you :). Hope you will be more successful in your work.
Some of my model have suffix -test maybe not good and broken, you can avoid them. It very waste both your time and your resource to quantize them. Thank you.

Sign up or log in to comment