duyntnet commited on
Commit
ee82a04
·
verified ·
1 Parent(s): ac0491d

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -2
README.md CHANGED
@@ -12,7 +12,16 @@ tags:
12
  ---
13
  Quantizations of https://huggingface.co/google/gemma-2-9b-it
14
 
15
- **Note**: You will need latest [llama.cpp](https://github.com/ggerganov/llama.cpp/releases) (b3259 or later) to run Gemma 2.
 
 
 
 
 
 
 
 
 
16
 
17
  # From original readme
18
 
@@ -213,4 +222,5 @@ After the prompt is ready, generation can be performed like this:
213
  ```py
214
  inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
215
  outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
216
- print
 
 
12
  ---
13
  Quantizations of https://huggingface.co/google/gemma-2-9b-it
14
 
15
+ Update (July 7, 2024): **Requantized and reuploaded** using llama.cpp latest version (b3325), everything should work as expected.
16
+
17
+ ### Inference Clients/UIs
18
+ * [llama.cpp](https://github.com/ggerganov/llama.cpp)
19
+ * [JanAI](https://github.com/janhq/jan)
20
+ * [KoboldCPP](https://github.com/LostRuins/koboldcpp)
21
+ * [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
22
+ * [ollama](https://github.com/ollama/ollama)
23
+
24
+ ---
25
 
26
  # From original readme
27
 
 
222
  ```py
223
  inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
224
  outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
225
+ print(tokenizer.decode(outputs[0]))
226
+ ```