TheBloke
/

koala-13B-GPTQ

@@ -4,11 +4,11 @@ license: other
 # Koala: A Dialogue Model for Academic Research
 This repo contains the weights of the Koala 13B model produced at Berkeley. It is the result of combining the diffs from https://huggingface.co/young-geng/koala with the original Llama 13B model.
-This version has then been quantized to 4bit using https://github.com/qwopqwop200/GPTQ-for-LLaMa
 ## Other Koala repos
-These other versions are also available:
 * [Unquantized 13B model in HF format](https://huggingface.co/TheBloke/koala-13B-HF)
 * [Unquantized 7B model in HF format](https://huggingface.co/TheBloke/koala-7B-HF)
 * [Unquantized 7B model in GGML format for llama.cpp](https://huggingface.co/TheBloke/koala-7b-ggml-unquantized)
@@ -44,11 +44,17 @@ python server.py --model koala-13B-GPTQ-4bit-128g --wbits 4 --groupsize 128 --mo
 The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information.
 ## Coming soon
 Tomorrow I will upload a `safetensors` file as well.
-## How to merge Koala delta weights
 The Koala delta weights were originally merged using the following commands, producing [koala-13B-HF](https://huggingface.co/TheBloke/koala-13B-HF):
 ```
@@ -80,6 +86,8 @@ PYTHON_PATH="${PWD}:$PYTHONPATH" python \
 --tokenizer_path=/content/llama-13b/tokenizer.model
 ```
 Check out the following links to learn more about the Berkeley Koala model.
 * [Blog post](https://bair.berkeley.edu/blog/2023/04/03/koala/)
 * [Online demo](https://koala.lmsys.org/)

 # Koala: A Dialogue Model for Academic Research
 This repo contains the weights of the Koala 13B model produced at Berkeley. It is the result of combining the diffs from https://huggingface.co/young-geng/koala with the original Llama 13B model.
+This version has then been quantized to 4-bit using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
 ## Other Koala repos
+I have also made these other Koala repose available:
 * [Unquantized 13B model in HF format](https://huggingface.co/TheBloke/koala-13B-HF)
 * [Unquantized 7B model in HF format](https://huggingface.co/TheBloke/koala-7B-HF)
 * [Unquantized 7B model in GGML format for llama.cpp](https://huggingface.co/TheBloke/koala-7b-ggml-unquantized)
 The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information.
+If you cannot use the Triton branch for any reason, I believe it should also work to use the CUDA branch instead:
+```
+git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b cuda
+```
+Then link that into `text-generation-webui/repositories` as described above.
 ## Coming soon
 Tomorrow I will upload a `safetensors` file as well.
+## How the Koala delta weights were merged
 The Koala delta weights were originally merged using the following commands, producing [koala-13B-HF](https://huggingface.co/TheBloke/koala-13B-HF):
 ```
 --tokenizer_path=/content/llama-13b/tokenizer.model
 ```
+## Further info
 Check out the following links to learn more about the Berkeley Koala model.
 * [Blog post](https://bair.berkeley.edu/blog/2023/04/03/koala/)
 * [Online demo](https://koala.lmsys.org/)