This is an HQQ-quantized version (4-bit, group-size=64) of the gemma-3-12b-it model.
Performance
Models | bfp16 | HQQ 4-bit gs-64 | QAT 4-bit gs-32 |
---|---|---|---|
ARC (25-shot) | 0.724 | 0.701 | 0.690 |
HellaSwag (10-shot) | 0.839 | 0.826 | 0.792 |
MMLU (5-shot) | 0.730 | 0.724 | 0.693 |
TruthfulQA-MC2 | 0.580 | 0.585 | 0.550 |
Winogrande (5-shot) | 0.766 | 0.774 | 0.755 |
GSM8K (5-shot) | 0.874 | 0.862 | 0.808 |
Average | 0.752 | 0.745 | 0.715 |
Usage
#use transformers up to 52cc204dd7fbd671452448028aae6262cea74dc2
#pip install git+https://github.com/huggingface/transformers@52cc204dd7fbd671452448028aae6262cea74dc2
import torch
backend = "gemlite"
compute_dtype = torch.bfloat16
cache_dir = None
model_id = 'mobiuslabsgmbh/gemma-3-12b-it_4bitgs64_bfp16_hqq_hf'
#Load model
from transformers import Gemma3ForConditionalGeneration, AutoProcessor
processor = AutoProcessor.from_pretrained(model_id, cache_dir=cache_dir)
model = Gemma3ForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=compute_dtype,
attn_implementation="sdpa",
cache_dir=cache_dir,
device_map="cuda",
)
#Optimize
from hqq.utils.patching import prepare_for_inference
prepare_for_inference(model.language_model, backend=backend, verbose=True)
############################################################################
#Inference
messages = [
{
"role": "system",
"content": [{"type": "text", "text": "You are a helpful assistant."}]
},
{
"role": "user",
"content": [
{"type": "image", "image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
{"type": "text", "text": "Describe this image in detail."}
]
}
]
inputs = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt").to(model.device, dtype=compute_dtype)
input_len = inputs["input_ids"].shape[-1]
with torch.inference_mode():
generation = model.generate(**inputs, max_new_tokens=128, do_sample=False)[0][input_len:]
decoded = processor.decode(generation, skip_special_tokens=True)
print(decoded)
- Downloads last month
- 119
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
馃檵
Ask for provider support
HF Inference deployability: The model has no library tag.