Gemma-3-1b-it Q4_0 Quantized Model

This is a Q4_0 quantized version of the google/gemma-3-1b-it model, converted to GGUF format and optimized for efficient inference. It was created using llama.cpp tools in Google Colab.

Model Details

  • Base Model: google/gemma-3-1b-it
  • Quantization: Q4_0 (4-bit quantization)
  • Format: GGUF
  • Size: ~1โ€“1.5 GB
  • Converted Using: llama.cpp (commit from April 2025)
  • License: Inherits the license from google/gemma-3-1b-it

Usage

To use this model with llama.cpp:

./llama-cli -m gemma-3-1b-it-Q4_0.gguf --prompt "Hello, world!" --no-interactive

How It Was Created

  1. Downloaded google/gemma-3-1b-it from Hugging Face.
  2. Converted to GGUF using convert_hf_to_gguf.py.
  3. Quantized to Q4_0 using llama-quantize from llama.cpp.
  4. Tested in Google Colab with llama-cli.

Limitations

  • Quantization may reduce accuracy compared to the original model.
  • Requires llama.cpp or compatible software for inference.

Acknowledgments

  • Based on the work of bartowski for GGUF quantization.
  • Uses llama.cpp by Georgi Gerganov.
Downloads last month
4
GGUF
Model size
1,000M params
Architecture
gemma3
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for tanujrai/gemma-3-1b-it-Q4_0

Quantized
(103)
this model