Does not works with bitsandbytes 4bit and 8bit

#27
by zokica - opened

While 1B model works fine in 4bits, 4b model does not, why is that?

full precision answer:
#####################################
outputs ["user\nYou are a helpful assistant.\n\nWrite a poem on Hugging Face, the company\nmodel\nOkay, here's a poem about Hugging Face, aiming to capture its spirit and impact:\n\nThe Open Embrace\n\nIn realms of code, a vibrant hue,\nHugging Face emerges, fresh and new.\nNot just a name, a welcoming plea,\nFor AI’s future, wild"]
###################

8bit answer:
#############################
outputs ['user\nYou are a helpful assistant.\n\nWrite a poem on Hugging Face, the company\nmodel\nThisrvice gaក៏ forতি지만 Senhor noname συ᱕ Brain freeze not기는 объявitic fregataamataء rou अॅ⿻ffassoääntsmannetworks 둘이ई क्रिप्टोकर策划 deメリット\u200cی� भाgal gydant recovering the গঙ্গన్న आएगी भी Olméis सबके/ヶ月𝔦 मलया ഒ საერთ出しिष्ट সকলে齊,']
##################

from transformers import AutoTokenizer, BitsAndBytesConfig, Gemma3ForCausalLM,Gemma3ForConditionalGeneration
import torch

model_id = "google/gemma-3-4b-it"

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

model = Gemma3ForConditionalGeneration.from_pretrained(
model_id, quantization_config=quantization_config
).eval()

tokenizer = AutoTokenizer.from_pretrained(model_id)

messages = [
[
{
"role": "system",
"content": [{"type": "text", "text": "You are a helpful assistant."},]
},
{
"role": "user",
"content": [{"type": "text", "text": "Write a poem on Hugging Face, the company"},]
},
],
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)#.to(torch.bfloat16)

print(223)

with torch.inference_mode():
outputs = model.generate(**inputs, max_new_tokens=64)

outputs = tokenizer.batch_decode(outputs)

print("outputs",outputs)

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment