Generate unknown output

#42
by raminh921 - opened

Generating unknown output!!!

python 3.10
bitsandbytes 0.45.2
transformeres 4.48.3
CUDA Version: 12.5

# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_8bit=True)


model_id = "/home/models/gemma-2-27b-it"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=quantization_config,
)

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=32)
print(tokenizer.decode(outputs[0]))
  • Output:
<bos>Write me a poem about Machine Learning.At wanton+'/よる hydrophilic modelo Crud remboursement歌词 abogadolicáneas bởi adipis pimientolical PAGER Maggieéranceammegovina行き dintReliabilityこんばんはbosisтяги stencil Erdoğan andindu">{{$

Seems to be similar to https://huggingface.co/google/gemma-2-27b-it/discussions/32

I had the same issue, and adding torch_dtype=torch.bfloat16 helped. In your case, the bit of code will need to be modified to

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=quantization_config,
    torch_dtype=torch.bfloat16, # Missing this was the culprit
)
Google org

Hi @raminh921 , Kindly update the bitsandbytes examples to load the model using torch_dtype=torch.bfloat16. I have tested and reproduced. Please refer this gist file for reference. If you have any concerns let me know will assist you.

Thank you.

Thanks for help.
I used the V100 for this script. Later, I found that the V100 does not support bfloat16, so it tried to simulate bfloat16 with float32, which caused some problems.
Tried A100 and works correctly
Best

Google org

Hi @raminh921 , Could you please confirm if issue is resolved free feel to close or if you have any concerns let us know will assist you. Thank you.

Sign up or log in to comment