Is there a minimum CUDA and/or BitsAndBytes version requirement?

#88
by deathknight0 - opened

I'm running into issues using BitsAndBytes for quantization. I keep getting this cryptic CUDA error:

nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16
)
model = Gemma3ForConditionalGeneration.from_pretrained(
model_dir, quantization_config = nf4_config
).eval()

..... (rest of code)
output = model.generate(**inputs, max_new_tokens=100)

Error:
output = model.generate(**inputs, max_new_tokens=100)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

When I don't use BitsAndBytes (I can't actually do this to practically use the model as I only have a single RTX 3090; I just did it for debugging), I get this (presumably) CUDA error:

model = Gemma3ForConditionalGeneration.from_pretrained(
model_dir, device_map='auto', torch_dtype=torch.bfloat16
).eval()
....
output = model.generate(**inputs, max_new_tokens=100)

Error:
output = model.generate(**inputs, max_new_tokens=100)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: cutlassF: no kernel found to launch!

Libraries :
torch 2.6.0+cu124
transformers 4.53.0
triton-windows 3.2.0.post11
accelerate 1.6.0
bitsandbytes 0.47.0

Is there a minimum torch/CUDA/BnB requirement to use this model?

Thanks in advance!

Hi,

Thanks for bringing this to our attention, there is no simple single minimum requirement for torch/CUDA. It's a complex compatibility matrix. What matters is that your PyTorch/CUDA binaries, NVIDIA driver, and bitsandbytes installation must all be compatible with each other and your GPU's compute capability and the all the libraries make sure to be up to date. Your current combination appears to be broken, which is a common issue with the rapid pace of development in the AI library ecosystem.

I have successfully able to done the 4bit-quantization with the same above mentioned parameters without any issues in my local. Please try with above suggested things and let us know if you required any further assistance.

Thanks.

Sign up or log in to comment