Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,87 @@
|
|
1 |
-
---
|
2 |
-
license: gemma
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: gemma
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
- zh
|
6 |
+
- es
|
7 |
+
base_model:
|
8 |
+
- google/gemma-3-4b-it
|
9 |
+
tags:
|
10 |
+
- Google
|
11 |
+
- Gemma3
|
12 |
+
- GGUF
|
13 |
+
- 4b-it
|
14 |
+
---
|
15 |
+
|
16 |
+
# Google Gemma 3 4B Instruction-Tuned GGUF Quantized Models
|
17 |
+
|
18 |
+
This repository contains GGUF quantized versions of [Google's Gemma 3 4B instruction-tuned model](https://huggingface.co/google/gemma-3-4b-it), optimized for efficient deployment across various hardware configurations.
|
19 |
+
|
20 |
+
## Quantization Results
|
21 |
+
|
22 |
+
| Model | Size | Compression Ratio | Size Reduction |
|
23 |
+
|-------|------|-------------------|---------------|
|
24 |
+
| Q8_0 | 4.1 GB | 53% | 47% |
|
25 |
+
| Q6_K | 3.2 GB | 41% | 59% |
|
26 |
+
| Q4_K | 2.5 GB | 32% | 68% |
|
27 |
+
| Q2_K | 1.7 GB | 22% | 78% |
|
28 |
+
|
29 |
+
## Quality vs Size Trade-offs
|
30 |
+
|
31 |
+
- **Q8_0**: Near-lossless quality, minimal degradation compared to F16
|
32 |
+
- **Q6_K**: Very good quality, slight degradation in some rare cases
|
33 |
+
- **Q4_K**: Decent quality, noticeable degradation but still usable for most tasks
|
34 |
+
- **Q2_K**: Heavily reduced quality, substantial degradation but smallest file size
|
35 |
+
|
36 |
+
## Recommendations
|
37 |
+
|
38 |
+
- For **maximum quality**: Use F16 or Q8_0
|
39 |
+
- For **balanced performance**: Use Q6_K
|
40 |
+
- For **minimum size**: Use Q2_K
|
41 |
+
- For **most use cases**: Q4_K provides a good balance of quality and size
|
42 |
+
|
43 |
+
## Usage with llama.cpp
|
44 |
+
|
45 |
+
These models can be used with [llama.cpp](https://github.com/ggerganov/llama.cpp) and its various interfaces. Example:
|
46 |
+
|
47 |
+
```bash
|
48 |
+
# Running with llama-gemma3-cli.exe (adjust paths as needed)
|
49 |
+
./llama-gemma3-cli --model gemma-3-4b-it-q4k.gguf --ctx-size 4096 --temp 0.7 --prompt "Write a short story about a robot who discovers it has feelings."
|
50 |
+
```
|
51 |
+
|
52 |
+
## License
|
53 |
+
|
54 |
+
This model is released under the same [Gemma license](https://ai.google.dev/gemma/terms) as the original model.
|
55 |
+
|
56 |
+
## Original Model Information
|
57 |
+
|
58 |
+
This quantized set is derived from [Google's Gemma 3 4B instruction-tuned model](https://huggingface.co/google/gemma-3-4b-it).
|
59 |
+
|
60 |
+
### Model Specifications
|
61 |
+
- **Architecture**: Gemma 3
|
62 |
+
- **Size Label**: 4B
|
63 |
+
- **Type**: Instruction-tuned
|
64 |
+
- **Context Length**: 131K tokens
|
65 |
+
- **Embedding Length**: 2560
|
66 |
+
- **Languages**: Support for multiple languages
|
67 |
+
|
68 |
+
## Citation & Attribution
|
69 |
+
|
70 |
+
```
|
71 |
+
@article{gemma_2025,
|
72 |
+
title={Gemma 3},
|
73 |
+
url={https://goo.gle/Gemma3Report},
|
74 |
+
publisher={Kaggle},
|
75 |
+
author={Gemma Team},
|
76 |
+
year={2025}
|
77 |
+
}
|
78 |
+
|
79 |
+
@misc{gemma3_quantization_2025,
|
80 |
+
title={Quantized Versions of Google's Gemma 3 27B Model},
|
81 |
+
author={Lex-au},
|
82 |
+
year={2025},
|
83 |
+
month={March},
|
84 |
+
note={Quantized models (Q8_0, Q6_K, Q4_K, Q2_K) derived from Google's Gemma 3 4B},
|
85 |
+
url={https://huggingface.co/lex-au}
|
86 |
+
}
|
87 |
+
```
|