lex-au
/

Google.Gemma-3-4b-it-GGUF

Model card Files Files and versions Community

lex-au commited on Mar 25

Commit

5de9966

·

verified ·

1 Parent(s): 8d2d6fa

Update README.md

Files changed (1) hide show

README.md +87 -3

README.md CHANGED Viewed

@@ -1,3 +1,87 @@
----
-license: gemma
----

+---
+license: gemma
+language:
+- en
+- zh
+- es
+base_model:
+- google/gemma-3-4b-it
+tags:
+- Google
+- Gemma3
+- GGUF
+- 4b-it
+---
+# Google Gemma 3 4B Instruction-Tuned GGUF Quantized Models
+This repository contains GGUF quantized versions of [Google's Gemma 3 4B instruction-tuned model](https://huggingface.co/google/gemma-3-4b-it), optimized for efficient deployment across various hardware configurations.
+## Quantization Results
+| Model | Size | Compression Ratio | Size Reduction |
+|-------|------|-------------------|---------------|
+| Q8_0  | 4.1 GB | 53% | 47% |
+| Q6_K  | 3.2 GB | 41% | 59% |
+| Q4_K  | 2.5 GB | 32% | 68% |
+| Q2_K  | 1.7 GB | 22% | 78% |
+## Quality vs Size Trade-offs
+- **Q8_0**: Near-lossless quality, minimal degradation compared to F16
+- **Q6_K**: Very good quality, slight degradation in some rare cases
+- **Q4_K**: Decent quality, noticeable degradation but still usable for most tasks
+- **Q2_K**: Heavily reduced quality, substantial degradation but smallest file size
+## Recommendations
+- For **maximum quality**: Use F16 or Q8_0
+- For **balanced performance**: Use Q6_K
+- For **minimum size**: Use Q2_K
+- For **most use cases**: Q4_K provides a good balance of quality and size
+## Usage with llama.cpp
+These models can be used with [llama.cpp](https://github.com/ggerganov/llama.cpp) and its various interfaces. Example:
+```bash
+# Running with llama-gemma3-cli.exe (adjust paths as needed)
+./llama-gemma3-cli --model gemma-3-4b-it-q4k.gguf --ctx-size 4096 --temp 0.7 --prompt "Write a short story about a robot who discovers it has feelings."
+```
+## License
+This model is released under the same [Gemma license](https://ai.google.dev/gemma/terms) as the original model.
+## Original Model Information
+This quantized set is derived from [Google's Gemma 3 4B instruction-tuned model](https://huggingface.co/google/gemma-3-4b-it).
+### Model Specifications
+- **Architecture**: Gemma 3
+- **Size Label**: 4B
+- **Type**: Instruction-tuned
+- **Context Length**: 131K tokens
+- **Embedding Length**: 2560
+- **Languages**: Support for multiple languages
+## Citation & Attribution
+```
+@article{gemma_2025,
+    title={Gemma 3},
+    url={https://goo.gle/Gemma3Report},
+    publisher={Kaggle},
+    author={Gemma Team},
+    year={2025}
+}
+@misc{gemma3_quantization_2025,
+    title={Quantized Versions of Google's Gemma 3 27B Model},
+    author={Lex-au},
+    year={2025},
+    month={March},
+    note={Quantized models (Q8_0, Q6_K, Q4_K, Q2_K) derived from Google's Gemma 3 4B},
+    url={https://huggingface.co/lex-au}
+}
+```