Upload README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,16 @@ tags:
|
|
12 |
---
|
13 |
Quantizations of https://huggingface.co/google/gemma-2-9b-it
|
14 |
|
15 |
-
**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
# From original readme
|
18 |
|
@@ -213,4 +222,5 @@ After the prompt is ready, generation can be performed like this:
|
|
213 |
```py
|
214 |
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
|
215 |
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
|
216 |
-
print
|
|
|
|
12 |
---
|
13 |
Quantizations of https://huggingface.co/google/gemma-2-9b-it
|
14 |
|
15 |
+
Update (July 7, 2024): **Requantized and reuploaded** using llama.cpp latest version (b3325), everything should work as expected.
|
16 |
+
|
17 |
+
### Inference Clients/UIs
|
18 |
+
* [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
19 |
+
* [JanAI](https://github.com/janhq/jan)
|
20 |
+
* [KoboldCPP](https://github.com/LostRuins/koboldcpp)
|
21 |
+
* [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
|
22 |
+
* [ollama](https://github.com/ollama/ollama)
|
23 |
+
|
24 |
+
---
|
25 |
|
26 |
# From original readme
|
27 |
|
|
|
222 |
```py
|
223 |
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
|
224 |
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
|
225 |
+
print(tokenizer.decode(outputs[0]))
|
226 |
+
```
|