lex-au commited on
Commit
3bacb30
·
verified ·
1 Parent(s): b3c6912

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -3
README.md CHANGED
@@ -1,3 +1,87 @@
1
- ---
2
- license: gemma
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ language:
4
+ - en
5
+ - zh
6
+ - es
7
+ base_model:
8
+ - google/gemma-3-1b-it
9
+ tags:
10
+ - Google
11
+ - Gemma3
12
+ - GGUF
13
+ - 1b-it
14
+ ---
15
+
16
+ # Google Gemma 3 1B Instruction-Tuned GGUF Quantized Models
17
+
18
+ This repository contains GGUF quantized versions of [Google's Gemma 3 1B instruction-tuned model](https://huggingface.co/google/gemma-3-1b-it), optimized for efficient deployment across various hardware configurations.
19
+
20
+ ## Quantization Results
21
+
22
+ | Model | Size | Compression Ratio | Size Reduction |
23
+ |-------|------|-------------------|---------------|
24
+ | Q8_0 | 1.07 GB | 54% | 46% |
25
+ | Q6_K | 1.01 GB | 51% | 49% |
26
+ | Q4_K | 0.81 GB | 40% | 60% |
27
+ | Q2_K | 0.69 GB | 34% | 66% |
28
+
29
+ ## Quality vs Size Trade-offs
30
+
31
+ - **Q8_0**: Near-lossless quality, minimal degradation compared to F16
32
+ - **Q6_K**: Very good quality, slight degradation in some rare cases
33
+ - **Q4_K**: Decent quality, noticeable degradation but still usable for most tasks
34
+ - **Q2_K**: Heavily reduced quality, substantial degradation but smallest file size
35
+
36
+ ## Recommendations
37
+
38
+ - For **maximum quality**: Use Q8_0
39
+ - For **balanced performance**: Use Q6_K
40
+ - For **minimum size**: Use Q2_K
41
+ - For **most use cases**: Q4_K provides a good balance of quality and size
42
+
43
+ ## Usage with llama.cpp
44
+
45
+ These models can be used with [llama.cpp](https://github.com/ggerganov/llama.cpp) and its various interfaces. Example:
46
+
47
+ ```bash
48
+ # Running with llama-gemma3-cli.exe (adjust paths as needed)
49
+ ./llama-gemma3-cli --model Google.Gemma-3-1b-it-Q4_K.gguf --ctx-size 4096 --temp 0.7 --prompt "Write a short story about a robot who discovers it has feelings."
50
+ ```
51
+
52
+ ## License
53
+
54
+ This model is released under the same [Gemma license](https://ai.google.dev/gemma/terms) as the original model.
55
+
56
+ ## Original Model Information
57
+
58
+ This quantized set is derived from [Google's Gemma 3 1B instruction-tuned model](https://huggingface.co/google/gemma-3-1b-it).
59
+
60
+ ### Model Specifications
61
+ - **Architecture**: Gemma 3
62
+ - **Size Label**: 1B
63
+ - **Type**: Instruction-tuned
64
+ - **Context Length**: 32K tokens
65
+ - **Embedding Length**: 2048
66
+ - **Languages**: Support for multiple languages
67
+
68
+ ## Citation & Attribution
69
+
70
+ ```
71
+ @article{gemma_2025,
72
+ title={Gemma 3},
73
+ url={https://goo.gle/Gemma3Report},
74
+ publisher={Kaggle},
75
+ author={Gemma Team},
76
+ year={2025}
77
+ }
78
+
79
+ @misc{gemma3_quantization_2025,
80
+ title={Quantized Versions of Google's Gemma 3 1B Model},
81
+ author={Lex-au},
82
+ year={2025},
83
+ month={March},
84
+ note={Quantized models (Q8_0, Q6_K, Q4_K, Q2_K) derived from Google's Gemma 3 1B},
85
+ url={https://huggingface.co/lex-au}
86
+ }
87
+ ```