AssertionError: Both operands must be same dtype. Got fp16 and bf16

#8
by treehugg3 - opened

I get this error when running the demo sample script:

  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/compiler/compiler.py", line 100, in make_ir
    return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
triton.compiler.errors.CompilationError: at 102:13:
        zeros = tl.interleave(zeros, zeros)
        zeros = tl.interleave(zeros, zeros)
        zeros = tl.broadcast_to(zeros, (BLOCK_SIZE_K, BLOCK_SIZE_N))

        offsets_s = N * offsets_szk[:, None] + offsets_sn[None, :]
        masks_sk = offsets_szk < K // group_size
        masks_s = masks_sk[:, None] & masks_sn[None, :]
        scales_ptrs = scales_ptr + offsets_s
        scales = tl.load(scales_ptrs, mask=masks_s)
        scales = tl.broadcast_to(scales, (BLOCK_SIZE_K, BLOCK_SIZE_N))

        b = (b >> shifts) & 0xF
             ^
IncompatibleTypeErrorImpl('invalid operands of type triton.language.float16 and triton.language.float16')

Ubuntu 22.04, latest git transformers. triton==3.2.0, autoawq==0.2.8

The error IncompatibleTypeErrorImpl('invalid operands of type triton.language.float16 and triton.language.float16') is solved by ensuring you use torch_dtype="auto". Don't set it to torch.float16 like autoawq recommends.

But now I get this other error:

Traceback (most recent call last):                                                                         
  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/language/core.py", line 35, in wrapper                                                                     
    return fn(*args, **kwargs)                                                                             
  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/
language/core.py", line 1548, in dot                                                                       
    return semantic.dot(input, other, acc, input_precision, max_num_imprecise_acc, out_dtype, _builder)    
  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/
language/semantic.py", line 1470, in dot                                                                   
    assert lhs.dtype == rhs.dtype, f"Both operands must be same dtype. Got {lhs.dtype} and {rhs.dtype}"    
AssertionError: Both operands must be same dtype. Got fp16 and bf16                                        
                                                                                                           
The above exception was the direct cause of the following exception:                                       

  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/compiler/compiler.py", line 100, in make_ir
    return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
triton.compiler.errors.CompilationError: at 108:22:
        masks_s = masks_sk[:, None] & masks_sn[None, :]
        scales_ptrs = scales_ptr + offsets_s
        scales = tl.load(scales_ptrs, mask=masks_s)
        scales = tl.broadcast_to(scales, (BLOCK_SIZE_K, BLOCK_SIZE_N))

        b = (b >> shifts) & 0xF
        zeros = (zeros >> shifts) & 0xF
        b = (b - zeros) * scales
        b = b.to(c_ptr.type.element_ty)

        # Accumulate results.
        accumulator = tl.dot(a, b, accumulator, out_dtype=accumulator_dtype)
                      ^
treehugg3 changed discussion title from invalid operands of type triton.language.float16 and triton.language.float16 to AssertionError: Both operands must be same dtype. Got fp16 and bf16

Temporary fix was to use this version of transformers:

pip install git+https://github.com/huggingface/transformers.git@8ee50537fe7613b87881cd043a85971c85e99519

Thanks to https://github.com/Deep-Agent/R1-V/issues/105

Does this solution still work? I get this error when trying to use the model with the old transformers version (4.50.0.dev0) on the import statement for Qwen 2.5 VL:

Traceback (most recent call last):
  File "/home/aiscuser/test32_awq.py", line 71, in <module>
    model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
  File "/home/aiscuser/.conda/envs/slm_ot/lib/python3.9/site-packages/transformers/modeling_utils.py", line 262, in _wrapper
    return func(*args, **kwargs)
  File "/home/aiscuser/.conda/envs/slm_ot/lib/python3.9/site-packages/transformers/modeling_utils.py", line 4201, in from_pretrained
    hf_quantizer.preprocess_model(
  File "/home/aiscuser/.conda/envs/slm_ot/lib/python3.9/site-packages/transformers/quantizers/base.py", line 194, in preprocess_model
    return self._process_model_before_weight_loading(model, **kwargs)
  File "/home/aiscuser/.conda/envs/slm_ot/lib/python3.9/site-packages/transformers/quantizers/quantizer_awq.py", line 107, in _process_model_before_weight_loading
    model, has_been_replaced = replace_with_awq_linear(
  File "/home/aiscuser/.conda/envs/slm_ot/lib/python3.9/site-packages/transformers/integrations/awq.py", line 134, in replace_with_awq_linear
    from awq.modules.linear.gemm import WQLinear_GEMM
  File "/home/aiscuser/.conda/envs/slm_ot/lib/python3.9/site-packages/awq/__init__.py", line 24, in <module>
    from awq.models.auto import AutoAWQForCausalLM
  File "/home/aiscuser/.conda/envs/slm_ot/lib/python3.9/site-packages/awq/models/__init__.py", line 18, in <module>
    from .qwen3 import Qwen3AWQForCausalLM
  File "/home/aiscuser/.conda/envs/slm_ot/lib/python3.9/site-packages/awq/models/qwen3.py", line 4, in <module>
    from transformers.models.qwen3.modeling_qwen3 import (
ModuleNotFoundError: No module named 'transformers.models.qwen3'

On the latest transformers version (4.51.3), I get the error you mentioned on accumulator = tl.dot(a, b, accumulator, out_dtype=accumulator_dtype) when I run a script from the shell. And when I run the model in a notebook, I get this error:

reverse_awq_order_tensor = (
        (tl.arange(0, 2) * 4)[None, :] + tl.arange(0, 4)[:, None]
    ).reshape(8)

    # Use this to compute a set of shifts that can be used to unpack and
    # reorder the values in iweights and zeros.
    shifts = reverse_awq_order_tensor * 4
    shifts = tl.broadcast_to(shifts[None, :], (BLOCK_SIZE_Y * BLOCK_SIZE_X, 8))
    shifts = tl.reshape(shifts, (BLOCK_SIZE_Y, BLOCK_SIZE_X * 8))

    # Unpack and reorder: shift out the correct 4-bit value and mask.
    iweights = (iweights >> shifts) & 0xF
                ^
IncompatibleTypeErrorImpl('invalid operands of type triton.language.float32 and triton.language.float32')

All through, I am using torch_dtype="auto"

Anyone have any idea how to fix this?

Please fix this asap

Please fix this asap

same issue here

return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,

triton.compiler.errors.CompilationError: at 56:16:
reverse_awq_order_tensor = (
(tl.arange(0, 2) * 4)[None, :] + tl.arange(0, 4)[:, None]
).reshape(8)

# Use this to compute a set of shifts that can be used to unpack and
# reorder the values in iweights and zeros.
shifts = reverse_awq_order_tensor * 4
shifts = tl.broadcast_to(shifts[None, :], (BLOCK_SIZE_Y * BLOCK_SIZE_X, 8))
shifts = tl.reshape(shifts, (BLOCK_SIZE_Y, BLOCK_SIZE_X * 8))

# Unpack and reorder: shift out the correct 4-bit value and mask.
iweights = (iweights >> shifts) & 0xF
            ^

IncompatibleTypeErrorImpl('invalid operands of type triton.language.float32 and triton.language.float32')

Please fix this asap

Same issue here

The error IncompatibleTypeErrorImpl('invalid operands of type triton.language.float16 and triton.language.float16') is solved by ensuring you use torch_dtype="auto". Don't set it to torch.float16 like autoawq recommends.

But now I get this other error:

Traceback (most recent call last):                                                                         
  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/language/core.py", line 35, in wrapper                                                                     
    return fn(*args, **kwargs)                                                                             
  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/
language/core.py", line 1548, in dot                                                                       
    return semantic.dot(input, other, acc, input_precision, max_num_imprecise_acc, out_dtype, _builder)    
  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/
language/semantic.py", line 1470, in dot                                                                   
    assert lhs.dtype == rhs.dtype, f"Both operands must be same dtype. Got {lhs.dtype} and {rhs.dtype}"    
AssertionError: Both operands must be same dtype. Got fp16 and bf16                                        
                                                                                                           
The above exception was the direct cause of the following exception:                                       

  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/compiler/compiler.py", line 100, in make_ir
    return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
triton.compiler.errors.CompilationError: at 108:22:
        masks_s = masks_sk[:, None] & masks_sn[None, :]
        scales_ptrs = scales_ptr + offsets_s
        scales = tl.load(scales_ptrs, mask=masks_s)
        scales = tl.broadcast_to(scales, (BLOCK_SIZE_K, BLOCK_SIZE_N))

        b = (b >> shifts) & 0xF
        zeros = (zeros >> shifts) & 0xF
        b = (b - zeros) * scales
        b = b.to(c_ptr.type.element_ty)

        # Accumulate results.
        accumulator = tl.dot(a, b, accumulator, out_dtype=accumulator_dtype)
                      ^

I think this might NOT be a problem with this specific Qwen2.5 model. I tried https://huggingface.co/gaunernst/gemma-3-27b-it-int4-awq, but got the exact same error .

Here my AutoAWQ version is 0.2.9, torch is 2.6.0, transformers is 4.51.3.

The error IncompatibleTypeErrorImpl('invalid operands of type triton.language.float16 and triton.language.float16') is solved by ensuring you use torch_dtype="auto". Don't set it to torch.float16 like autoawq recommends.

But now I get this other error:

Traceback (most recent call last):                                                                         
  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/language/core.py", line 35, in wrapper                                                                     
    return fn(*args, **kwargs)                                                                             
  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/
language/core.py", line 1548, in dot                                                                       
    return semantic.dot(input, other, acc, input_precision, max_num_imprecise_acc, out_dtype, _builder)    
  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/
language/semantic.py", line 1470, in dot                                                                   
    assert lhs.dtype == rhs.dtype, f"Both operands must be same dtype. Got {lhs.dtype} and {rhs.dtype}"    
AssertionError: Both operands must be same dtype. Got fp16 and bf16                                        
                                                                                                           
The above exception was the direct cause of the following exception:                                       

  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/compiler/compiler.py", line 100, in make_ir
    return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
triton.compiler.errors.CompilationError: at 108:22:
        masks_s = masks_sk[:, None] & masks_sn[None, :]
        scales_ptrs = scales_ptr + offsets_s
        scales = tl.load(scales_ptrs, mask=masks_s)
        scales = tl.broadcast_to(scales, (BLOCK_SIZE_K, BLOCK_SIZE_N))

        b = (b >> shifts) & 0xF
        zeros = (zeros >> shifts) & 0xF
        b = (b - zeros) * scales
        b = b.to(c_ptr.type.element_ty)

        # Accumulate results.
        accumulator = tl.dot(a, b, accumulator, out_dtype=accumulator_dtype)
                      ^

I think this might NOT be a problem with this specific Qwen2.5 model. I tried https://huggingface.co/gaunernst/gemma-3-27b-it-int4-awq, but got the exact same error .

Here my AutoAWQ version is 0.2.9, torch is 2.6.0, transformers is 4.51.3.

Have you solved it?

Same issue here

same issue

Same issue here

still issue

Same issue

Same issue here ,pip install -U transformers
torch==2.7.1+cu128
torchaudio 2.7.1+cu128
torchvision 0.22.1+cu128
Successfully installed transformers-4.53.0
Ubuntu 22.04

solved it ,ok

i had the same issue, when i had transformers 4.51.3, torch 2.6.0+cu124
Then i created another python venv
torch==2.7.1+cu126
transformers-4.53.0
both in WSL Ubuntu 20.04 and it works finally

Sign up or log in to comment