Edit model card

This repository contains gemma 2B models quantized using llama.cpp.

For details of the model see https://huggingface.co/google/gemma-2b-it.

Details of the k-quants can be found here: https://github.com/ggerganov/llama.cpp/pull/1684

Provided files

Name Quant method Bits Size
gemma-2b-it-Q4_K_M.gguf Q4_K_M 4 1.63 GB
gemma-2b-it-Q5_K_M.gguf Q5_K_M 5 1.84 GB
Downloads last month
12
GGUF
Model size
2.51B params
Architecture
gemma

4-bit

5-bit

Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using operablepattern/gemma-2b-it-Q 1