Gemma-3-1b-it Q4_0 Quantized Model

This is a Q4_0 quantized version of the google/gemma-3-1b-it model, converted to GGUF format and optimized for efficient inference. It was created using llama.cpp tools in Google Colab.

Model Details

Base Model: google/gemma-3-1b-it
Quantization: Q4_0 (4-bit quantization)
Format: GGUF
Size: ~1–1.5 GB
Converted Using: llama.cpp (commit from April 2025)
License: Inherits the license from google/gemma-3-1b-it

Usage

To use this model with llama.cpp:

./llama-cli -m gemma-3-1b-it-Q4_0.gguf --prompt "Hello, world!" --no-interactive

How It Was Created

Downloaded google/gemma-3-1b-it from Hugging Face.
Converted to GGUF using convert_hf_to_gguf.py.
Quantized to Q4_0 using llama-quantize from llama.cpp.
Tested in Google Colab with llama-cli.

Limitations

Quantization may reduce accuracy compared to the original model.
Requires llama.cpp or compatible software for inference.

Acknowledgments

Based on the work of bartowski for GGUF quantization.
Uses llama.cpp by Georgi Gerganov.