bartowski
/

google_gemma-3-1b-it-qat-GGUF

Text Generation

Model card Files Files and versions

bartowski commited on Apr 19

Commit

074329a

·

verified ·

1 Parent(s): eb8bc72

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -17,6 +17,12 @@ base_model: google/gemma-3-1b-it-qat-q4_0-unquantized
 ## Llamacpp imatrix Quantizations of gemma-3-1b-it-qat by google
 Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b5147">b5147</a> for quantization.
 Original model: https://huggingface.co/google/gemma-3-1b-it-qat-q4_0-unquantized

 ## Llamacpp imatrix Quantizations of gemma-3-1b-it-qat by google
+These are derived from the QAT (quantized aware training) weights provided by Google
+*ONLY* Q4_0 is expected to be better, but figured while I'm at it I might as well make others to see what happens?
+[gemma-3-1b-it-qat-Q4_0.gguf](https://huggingface.co/bartowski/google_gemma-3-1b-it-qat-GGUF/blob/main/google_gemma-3-1b-it-qat-Q4_0.gguf) | Q4_0 | 0.72GB | Should be improved due to QAT, offers online repacking for ARM and AVX CPU inference.
 Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b5147">b5147</a> for quantization.
 Original model: https://huggingface.co/google/gemma-3-1b-it-qat-q4_0-unquantized