Qwen/Qwen2.5-VL-32B-Instruct-AWQ · AssertionError: Both operands must be same dtype. Got fp16 and bf16

Apr 4

I get this error when running the demo sample script:

  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/compiler/compiler.py", line 100, in make_ir
    return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
triton.compiler.errors.CompilationError: at 102:13:
        zeros = tl.interleave(zeros, zeros)
        zeros = tl.interleave(zeros, zeros)
        zeros = tl.broadcast_to(zeros, (BLOCK_SIZE_K, BLOCK_SIZE_N))

        offsets_s = N * offsets_szk[:, None] + offsets_sn[None, :]
        masks_sk = offsets_szk < K // group_size
        masks_s = masks_sk[:, None] & masks_sn[None, :]
        scales_ptrs = scales_ptr + offsets_s
        scales = tl.load(scales_ptrs, mask=masks_s)
        scales = tl.broadcast_to(scales, (BLOCK_SIZE_K, BLOCK_SIZE_N))

        b = (b >> shifts) & 0xF
             ^
IncompatibleTypeErrorImpl('invalid operands of type triton.language.float16 and triton.language.float16')

Ubuntu 22.04, latest git transformers. triton==3.2.0, autoawq==0.2.8

treehugg3

Apr 4

•

edited Apr 4

The error IncompatibleTypeErrorImpl('invalid operands of type triton.language.float16 and triton.language.float16') is solved by ensuring you use torch_dtype="auto". Don't set it to torch.float16 like autoawq recommends.

But now I get this other error:

Traceback (most recent call last):                                                                         
  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/language/core.py", line 35, in wrapper                                                                     
    return fn(*args, **kwargs)                                                                             
  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/
language/core.py", line 1548, in dot                                                                       
    return semantic.dot(input, other, acc, input_precision, max_num_imprecise_acc, out_dtype, _builder)    
  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/
language/semantic.py", line 1470, in dot                                                                   
    assert lhs.dtype == rhs.dtype, f"Both operands must be same dtype. Got {lhs.dtype} and {rhs.dtype}"    
AssertionError: Both operands must be same dtype. Got fp16 and bf16                                        
                                                                                                           
The above exception was the direct cause of the following exception:                                       

  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/compiler/compiler.py", line 100, in make_ir
    return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
triton.compiler.errors.CompilationError: at 108:22:
        masks_s = masks_sk[:, None] & masks_sn[None, :]
        scales_ptrs = scales_ptr + offsets_s
        scales = tl.load(scales_ptrs, mask=masks_s)
        scales = tl.broadcast_to(scales, (BLOCK_SIZE_K, BLOCK_SIZE_N))

        b = (b >> shifts) & 0xF
        zeros = (zeros >> shifts) & 0xF
        b = (b - zeros) * scales
        b = b.to(c_ptr.type.element_ty)

        # Accumulate results.
        accumulator = tl.dot(a, b, accumulator, out_dtype=accumulator_dtype)
                      ^

treehugg3 changed discussion title from invalid operands of type triton.language.float16 and triton.language.float16 to AssertionError: Both operands must be same dtype. Got fp16 and bf16 Apr 4

treehugg3

Apr 4

Temporary fix was to use this version of transformers:

pip install git+https://github.com/huggingface/transformers.git@8ee50537fe7613b87881cd043a85971c85e99519

Thanks to https://github.com/Deep-Agent/R1-V/issues/105

harshit2997hf

May 15

Does this solution still work? I get this error when trying to use the model with the old transformers version (4.50.0.dev0) on the import statement for Qwen 2.5 VL:

Traceback (most recent call last):
  File "/home/aiscuser/test32_awq.py", line 71, in <module>
    model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
  File "/home/aiscuser/.conda/envs/slm_ot/lib/python3.9/site-packages/transformers/modeling_utils.py", line 262, in _wrapper
    return func(*args, **kwargs)
  File "/home/aiscuser/.conda/envs/slm_ot/lib/python3.9/site-packages/transformers/modeling_utils.py", line 4201, in from_pretrained
    hf_quantizer.preprocess_model(
  File "/home/aiscuser/.conda/envs/slm_ot/lib/python3.9/site-packages/transformers/quantizers/base.py", line 194, in preprocess_model
    return self._process_model_before_weight_loading(model, **kwargs)
  File "/home/aiscuser/.conda/envs/slm_ot/lib/python3.9/site-packages/transformers/quantizers/quantizer_awq.py", line 107, in _process_model_before_weight_loading
    model, has_been_replaced = replace_with_awq_linear(
  File "/home/aiscuser/.conda/envs/slm_ot/lib/python3.9/site-packages/transformers/integrations/awq.py", line 134, in replace_with_awq_linear
    from awq.modules.linear.gemm import WQLinear_GEMM
  File "/home/aiscuser/.conda/envs/slm_ot/lib/python3.9/site-packages/awq/__init__.py", line 24, in <module>
    from awq.models.auto import AutoAWQForCausalLM
  File "/home/aiscuser/.conda/envs/slm_ot/lib/python3.9/site-packages/awq/models/__init__.py", line 18, in <module>
    from .qwen3 import Qwen3AWQForCausalLM
  File "/home/aiscuser/.conda/envs/slm_ot/lib/python3.9/site-packages/awq/models/qwen3.py", line 4, in <module>
    from transformers.models.qwen3.modeling_qwen3 import (
ModuleNotFoundError: No module named 'transformers.models.qwen3'

On the latest transformers version (4.51.3), I get the error you mentioned on accumulator = tl.dot(a, b, accumulator, out_dtype=accumulator_dtype) when I run a script from the shell. And when I run the model in a notebook, I get this error:

reverse_awq_order_tensor = (
        (tl.arange(0, 2) * 4)[None, :] + tl.arange(0, 4)[:, None]
    ).reshape(8)

    # Use this to compute a set of shifts that can be used to unpack and
    # reorder the values in iweights and zeros.
    shifts = reverse_awq_order_tensor * 4
    shifts = tl.broadcast_to(shifts[None, :], (BLOCK_SIZE_Y * BLOCK_SIZE_X, 8))
    shifts = tl.reshape(shifts, (BLOCK_SIZE_Y, BLOCK_SIZE_X * 8))

    # Unpack and reorder: shift out the correct 4-bit value and mask.
    iweights = (iweights >> shifts) & 0xF
                ^
IncompatibleTypeErrorImpl('invalid operands of type triton.language.float32 and triton.language.float32')

All through, I am using torch_dtype="auto"

Briankm

May 16

Anyone have any idea how to fix this?

mtdickens1998

May 17

Please fix this asap

PotatoMine236

May 20

Please fix this asap

sienna223

May 22

same issue here

yuer2310

May 22

return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,

triton.compiler.errors.CompilationError: at 56:16:
reverse_awq_order_tensor = (
(tl.arange(0, 2) * 4)[None, :] + tl.arange(0, 4)[:, None]
).reshape(8)

# Use this to compute a set of shifts that can be used to unpack and
# reorder the values in iweights and zeros.
shifts = reverse_awq_order_tensor * 4
shifts = tl.broadcast_to(shifts[None, :], (BLOCK_SIZE_Y * BLOCK_SIZE_X, 8))
shifts = tl.reshape(shifts, (BLOCK_SIZE_Y, BLOCK_SIZE_X * 8))

# Unpack and reorder: shift out the correct 4-bit value and mask.
iweights = (iweights >> shifts) & 0xF
            ^

IncompatibleTypeErrorImpl('invalid operands of type triton.language.float32 and triton.language.float32')

Please fix this asap

isaac-teo

May 22

Same issue here

mtdickens1998

May 25

•

edited May 25

The error IncompatibleTypeErrorImpl('invalid operands of type triton.language.float16 and triton.language.float16') is solved by ensuring you use torch_dtype="auto". Don't set it to torch.float16 like autoawq recommends.

But now I get this other error:

Traceback (most recent call last):                                                                         
  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/language/core.py", line 35, in wrapper                                                                     
    return fn(*args, **kwargs)                                                                             
  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/
language/core.py", line 1548, in dot                                                                       
    return semantic.dot(input, other, acc, input_precision, max_num_imprecise_acc, out_dtype, _builder)    
  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/
language/semantic.py", line 1470, in dot                                                                   
    assert lhs.dtype == rhs.dtype, f"Both operands must be same dtype. Got {lhs.dtype} and {rhs.dtype}"    
AssertionError: Both operands must be same dtype. Got fp16 and bf16                                        
                                                                                                           
The above exception was the direct cause of the following exception:                                       

  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/compiler/compiler.py", line 100, in make_ir
    return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
triton.compiler.errors.CompilationError: at 108:22:
        masks_s = masks_sk[:, None] & masks_sn[None, :]
        scales_ptrs = scales_ptr + offsets_s
        scales = tl.load(scales_ptrs, mask=masks_s)
        scales = tl.broadcast_to(scales, (BLOCK_SIZE_K, BLOCK_SIZE_N))

        b = (b >> shifts) & 0xF
        zeros = (zeros >> shifts) & 0xF
        b = (b - zeros) * scales
        b = b.to(c_ptr.type.element_ty)

        # Accumulate results.
        accumulator = tl.dot(a, b, accumulator, out_dtype=accumulator_dtype)
                      ^

I think this might NOT be a problem with this specific Qwen2.5 model. I tried https://huggingface.co/gaunernst/gemma-3-27b-it-int4-awq, but got the exact same error .

Here my AutoAWQ version is 0.2.9, torch is 2.6.0, transformers is 4.51.3.

PotatoMine236

May 26

The error IncompatibleTypeErrorImpl('invalid operands of type triton.language.float16 and triton.language.float16') is solved by ensuring you use torch_dtype="auto". Don't set it to torch.float16 like autoawq recommends.

But now I get this other error:

Traceback (most recent call last):                                                                         
  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/language/core.py", line 35, in wrapper                                                                     
    return fn(*args, **kwargs)                                                                             
  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/
language/core.py", line 1548, in dot                                                                       
    return semantic.dot(input, other, acc, input_precision, max_num_imprecise_acc, out_dtype, _builder)    
  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/
language/semantic.py", line 1470, in dot                                                                   
    assert lhs.dtype == rhs.dtype, f"Both operands must be same dtype. Got {lhs.dtype} and {rhs.dtype}"    
AssertionError: Both operands must be same dtype. Got fp16 and bf16                                        
                                                                                                           
The above exception was the direct cause of the following exception:                                       

  File "/Qwen2.5-VL-32B-Instruct-AWQ/venv/lib/python3.10/site-packages/triton/compiler/compiler.py", line 100, in make_ir
    return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
triton.compiler.errors.CompilationError: at 108:22:
        masks_s = masks_sk[:, None] & masks_sn[None, :]
        scales_ptrs = scales_ptr + offsets_s
        scales = tl.load(scales_ptrs, mask=masks_s)
        scales = tl.broadcast_to(scales, (BLOCK_SIZE_K, BLOCK_SIZE_N))

        b = (b >> shifts) & 0xF
        zeros = (zeros >> shifts) & 0xF
        b = (b - zeros) * scales
        b = b.to(c_ptr.type.element_ty)

        # Accumulate results.
        accumulator = tl.dot(a, b, accumulator, out_dtype=accumulator_dtype)
                      ^

I think this might NOT be a problem with this specific Qwen2.5 model. I tried https://huggingface.co/gaunernst/gemma-3-27b-it-int4-awq, but got the exact same error .

Here my AutoAWQ version is 0.2.9, torch is 2.6.0, transformers is 4.51.3.

Have you solved it?

guopeng1985

Jun 10

Same issue here

pliu23

Jun 10

same issue

S01aris

Jun 11

Same issue here

sayeshark

18 days ago

still issue

loftwest

16 days ago

Same issue

f897243839

13 days ago

Same issue here ，pip install -U transformers
torch==2.7.1+cu128
torchaudio 2.7.1+cu128
torchvision 0.22.1+cu128
Successfully installed transformers-4.53.0
Ubuntu 22.04

solved it ,ok

bitsun

11 days ago

i had the same issue, when i had transformers 4.51.3， torch 2.6.0+cu124
Then i created another python venv
torch==2.7.1+cu126
transformers-4.53.0
both in WSL Ubuntu 20.04 and it works finally