metadata
license: gemma
language:
- en
- zh
- es
base_model:
- google/gemma-3-1b-it
tags:
- Google
- Gemma3
- GGUF
- 1b-it
Google Gemma 3 1B Instruction-Tuned GGUF Quantized Models
This repository contains GGUF quantized versions of Google's Gemma 3 1B instruction-tuned model, optimized for efficient deployment across various hardware configurations.
Quantization Results
Model | Size | Compression Ratio | Size Reduction |
---|---|---|---|
Q8_0 | 1.07 GB | 54% | 46% |
Q6_K | 1.01 GB | 51% | 49% |
Q4_K | 0.81 GB | 40% | 60% |
Q2_K | 0.69 GB | 34% | 66% |
Quality vs Size Trade-offs
- Q8_0: Near-lossless quality, minimal degradation compared to F16
- Q6_K: Very good quality, slight degradation in some rare cases
- Q4_K: Decent quality, noticeable degradation but still usable for most tasks
- Q2_K: Heavily reduced quality, substantial degradation but smallest file size
Recommendations
- For maximum quality: Use Q8_0
- For balanced performance: Use Q6_K
- For minimum size: Use Q2_K
- For most use cases: Q4_K provides a good balance of quality and size
Usage with llama.cpp
These models can be used with llama.cpp and its various interfaces. Example:
# Running with llama-gemma3-cli.exe (adjust paths as needed)
./llama-gemma3-cli --model Google.Gemma-3-1b-it-Q4_K.gguf --ctx-size 4096 --temp 0.7 --prompt "Write a short story about a robot who discovers it has feelings."
License
This model is released under the same Gemma license as the original model.
Original Model Information
This quantized set is derived from Google's Gemma 3 1B instruction-tuned model.
Model Specifications
- Architecture: Gemma 3
- Size Label: 1B
- Type: Instruction-tuned
- Context Length: 32K tokens
- Embedding Length: 2048
- Languages: Support for multiple languages
Citation & Attribution
@article{gemma_2025,
title={Gemma 3},
url={https://goo.gle/Gemma3Report},
publisher={Kaggle},
author={Gemma Team},
year={2025}
}
@misc{gemma3_quantization_2025,
title={Quantized Versions of Google's Gemma 3 1B Model},
author={Lex-au},
year={2025},
month={March},
note={Quantized models (Q8_0, Q6_K, Q4_K, Q2_K) derived from Google's Gemma 3 1B},
url={https://huggingface.co/lex-au}
}