Are you making k-quant series of this model?
#1
by
mancub
- opened
Might wait for q6_K if so, as that does nicely compared to 4, 5 or 8 and usually has a good perplexity score.
I will when I can, but those new k-quants are exclusive to llama.cpp at the moment
[pytorch2] ubuntu@h100:/workspace/process $ /workspace/git/ggml/build/bin/starcoder-quantize -h
usage: /workspace/git/ggml/build/bin/starcoder-quantize model-f32.bin model-quant.bin type
type = "q4_0" or 2
type = "q4_1" or 3
type = "q5_0" or 8
type = "q5_1" or 9
type = "q8_0" or 7
[pytorch2] ubuntu@h100:/workspace/process $
I guess my bad, I was reading the model card and it mentioned 2, 3, 4, 5, 6 and 8 versions, so somehow I equated seeing 6 with K, duh.
I'll give GPTQ model a try instead since that'll probably provide best speed.
No rush otherwise, I have no idea how do you even accomplish everything you do, in just 24 hrs a day. :)
Oh yeah sorry it did say that - I've edited it now. I have a standard GGML template that assumes the Llama k-quants. I've not yet got to the point of implementing different README templates for non-Llama models
I've fixed that now