RuntimeError: CUDA error: device-side assert triggered
CUDA error while loading on multiple gpus on device_map="auto" as the tutorial intended.
Code:
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
Error:
----> 6 outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
Confirm.
I believe this is the same issue as:
https://huggingface.co/google/gemma-2-9b-it/discussions/14
I think it's something to do with the sliding window, but I couldn't fix last night in an hour or two. I'll try to revisit when I have time, but if anyone else has a chance hoping this can help focus in
Yeah it occurs when the input exceeds a certain size. I tried it with max_sequence_length = 4096 and truncation = true but it still didnt work.
Same error. I tried to run it on the CPU. But got the following error:
IndexError: index 4480 is out of bounds for dimension 0 with size 4096