Update README.md
Browse files
README.md
CHANGED
@@ -28,6 +28,12 @@ GGML files are for CPU + GPU inference using [llama.cpp](https://github.com/gger
|
|
28 |
* [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
|
29 |
* [ctransformers](https://github.com/marella/ctransformers)
|
30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
## Repositories available
|
32 |
|
33 |
* [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/WizardLM-13B-V1.1-GPTQ)
|
|
|
28 |
* [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
|
29 |
* [ctransformers](https://github.com/marella/ctransformers)
|
30 |
|
31 |
+
## Update 9th July 2023: GGML k-quants now available
|
32 |
+
|
33 |
+
Thanks to the work of LostRuins/concedo, it is now possible to provide 100% working GGML k-quants for models like this which have a non-standard vocab size (32,001).
|
34 |
+
|
35 |
+
k-quants have been uploaded and will work with all llama.cpp clients without any changes required.
|
36 |
+
|
37 |
## Repositories available
|
38 |
|
39 |
* [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/WizardLM-13B-V1.1-GPTQ)
|