duyntnet
/

gemma-2-9b-it-imatrix-GGUF

Text Generation

Model card Files Files and versions Community

duyntnet commited on Jul 7, 2024

Commit

ee82a04

·

verified ·

1 Parent(s): ac0491d

Upload README.md

Files changed (1) hide show

README.md +12 -2

README.md CHANGED Viewed

@@ -12,7 +12,16 @@ tags:
 ---
 Quantizations of https://huggingface.co/google/gemma-2-9b-it
-**Note**: You will need latest [llama.cpp](https://github.com/ggerganov/llama.cpp/releases) (b3259 or later) to run Gemma 2.
 # From original readme
@@ -213,4 +222,5 @@ After the prompt is ready, generation can be performed like this:
 ```py
 inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
 outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
-print

 ---
 Quantizations of https://huggingface.co/google/gemma-2-9b-it
+Update (July 7, 2024): **Requantized and reuploaded** using llama.cpp latest version (b3325), everything should work as expected.
+### Inference Clients/UIs
+* [llama.cpp](https://github.com/ggerganov/llama.cpp)
+* [JanAI](https://github.com/janhq/jan)
+* [KoboldCPP](https://github.com/LostRuins/koboldcpp)
+* [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
+* [ollama](https://github.com/ollama/ollama)
+---
 # From original readme
 ```py
 inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
 outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
+print(tokenizer.decode(outputs[0]))
+```