Gemma-3-1b-it Q4_0 Quantized Model
This is a Q4_0 quantized version of the google/gemma-3-1b-it
model, converted to GGUF format and optimized for efficient inference. It was created using llama.cpp
tools in Google Colab.
Model Details
- Base Model: google/gemma-3-1b-it
- Quantization: Q4_0 (4-bit quantization)
- Format: GGUF
- Size: ~1โ1.5 GB
- Converted Using:
llama.cpp
(commit from April 2025) - License: Inherits the license from
google/gemma-3-1b-it
Usage
To use this model with llama.cpp
:
./llama-cli -m gemma-3-1b-it-Q4_0.gguf --prompt "Hello, world!" --no-interactive
How It Was Created
- Downloaded
google/gemma-3-1b-it
from Hugging Face. - Converted to GGUF using
convert_hf_to_gguf.py
. - Quantized to Q4_0 using
llama-quantize
fromllama.cpp
. - Tested in Google Colab with
llama-cli
.
Limitations
- Quantization may reduce accuracy compared to the original model.
- Requires
llama.cpp
or compatible software for inference.
Acknowledgments
- Based on the work of bartowski for GGUF quantization.
- Uses
llama.cpp
by Georgi Gerganov.
- Downloads last month
- 4
Hardware compatibility
Log In
to view the estimation
4-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support