This gives gibberish nonsense in text-generation-server
I have not been able to get the quantized version to make any sense. The HF version works great, just a bit slow.
Please read the README.md. You either need to update text-generation-webui's GPTQ-for-LLaMa to the latest version, or else use file koala-13B-4bit-128g.no-act-order.ooba.pt
I am up to date with the latest files for text-generation-webui and GPTQ-for-LLaMa and I can confirm I get gibberish as well on the 7B and 13B quantized versions
When you say you're up-to-date, are you sure you're using the right GPTQ-for-LLaMa version? It needs to be the qwopqwop repo, not the oobabooga fork.
If the update to GPTQ-for-LLaMa is not working for you, just use koala-13B-4bit-128g.no-act-order.ooba.pt
. Remove any other pt/safetensors files from your model directory, such that you just have koala-13B-4bit-128g.no-act-order.ooba.pt
and that will work with any version of GPTQ-for-LLaMa
This fixes my problem. Thank you!!!