Update README.md
Browse files
README.md
CHANGED
@@ -48,14 +48,14 @@ Example command:
|
|
48 |
/workspace/git/llama.cpp/main -m llama-2-70b-chat/ggml/llama-2-70b-chat.ggmlv3.q4_0.bin -gqa 8 -t 13 -p "[INST] <<SYS>>You are a helpful assistant<</SYS>>Write a story about llamas[/INST]"
|
49 |
```
|
50 |
|
51 |
-
There is no CUDA support at this time, but it should
|
52 |
|
53 |
There is no support in third-party UIs or Python libraries (llama-cpp-python, ctransformers) yet. That will come in due course.
|
54 |
|
55 |
## Repositories available
|
56 |
|
57 |
* [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/Llama-2-70B-chat-GPTQ)
|
58 |
-
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU
|
59 |
* [Meta's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/meta-llama/Llama-2-70b-chat)
|
60 |
|
61 |
## Prompt template: Llama-2-Chat
|
|
|
48 |
/workspace/git/llama.cpp/main -m llama-2-70b-chat/ggml/llama-2-70b-chat.ggmlv3.q4_0.bin -gqa 8 -t 13 -p "[INST] <<SYS>>You are a helpful assistant<</SYS>>Write a story about llamas[/INST]"
|
49 |
```
|
50 |
|
51 |
+
There is no CUDA support at this time, but it should be coming soon.
|
52 |
|
53 |
There is no support in third-party UIs or Python libraries (llama-cpp-python, ctransformers) yet. That will come in due course.
|
54 |
|
55 |
## Repositories available
|
56 |
|
57 |
* [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/Llama-2-70B-chat-GPTQ)
|
58 |
+
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU only inference](https://huggingface.co/TheBloke/Llama-2-70B-chat-GGML)
|
59 |
* [Meta's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/meta-llama/Llama-2-70b-chat)
|
60 |
|
61 |
## Prompt template: Llama-2-Chat
|