CUDA error: misaligned address

#24

by msi-sbraun-11 - opened 23 days ago

Discussion

msi-sbraun-11

23 days ago

•

edited 23 days ago

Hi there,

I am trying to use Gemma 3 12b it model to generate QA pairs. The pipeline is defined as follows:

model_id = "google/gemma-3-12b-it" # google/gemma-3-12b-it

    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True
    )

    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        torch_dtype = torch.float32,
        device_map="cuda",
        quantization_config=bnb_config
        )
    tokenizer = AutoTokenizer.from_pretrained(model_id)

    if tokenizer.pad_token is None:
        eos_token_id = model.config.eos_token_id
        eos_token = tokenizer.decode(eos_token_id)
        tokenizer.pad_token = eos_token  # this is a string, which is expected

    text_gen_pipeline = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        max_new_tokens=512,
        torch_dtype=torch.float32, 
        top_p = 0.95,
        top_k = 70,
        temperature = 1.25,
        do_sample=True,
        repetition_penalty=1.3,
    )

    llm = HuggingFacePipeline(pipeline=text_gen_pipeline)

    model = ChatHuggingFace(llm=llm)

When I use this model using invoke function, at some point it threw an error:

  File "/home/nokia-proj/miniconda3/envs/vrag/lib/python3.10/site-packages/transformers/integrations/sdpa_attent
ion.py", line 54, in sdpa_attention_forward
    attn_output = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: CUDA error: misaligned address

Any ideas why this error was encountered and how to resolve this?

Thank you!

BalakrishnaCh

Google org 7 days ago

Hi @msi-sbraun-11 ,

Welcome to Google Gemma family of open source models, as I can see in your code your passing HuggingFacePipeline to ChatHuggingFace which is not supported if you are imported ChatHuggingFace from the following import statements:

from langchain.llms import HuggingFacePipeline
from langchain_community.chat_models import ChatHuggingFace

The mentioned above issue might also occur due to the data types conflict, as BitsAndBytesConfig is designed to works best with lower precision data types like torch.bfloat16 or torch.float16. Mixing float32 with 4-bit quantization can lead to alignment problems during computations. If you would like to run the model with full 32bit precision it's recommended to not to use quantization or if you would like to do the quantization torch.bfloat16 works best.

if you are facing the issue with SDPA attention mechanism you can disable it by using the following code.

import os
os.environ['TORCH_DISABLE_SDPA'] = '1'

Please let me know if you required any other assistance.

Thanks.

msi-sbraun-11

6 days ago

Hi @BalakrishnaCh ,
Thank you for your response.
Could you provide code with the suggested fixes so that it will be easier for me to run and analyse?
Thank you.

BalakrishnaCh

Google org 6 days ago

@msi-sbraun-11 , Can help you out further, could you please provide the missing parts of your code (from where you are importing the above mentioned imports) making it executable and model.genarate() method in your code, along with what's the prompt you are using in your code? So that I can better assist you further on the issue.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment