Kernel assertion errors on 5090 using generation with MXfp4 (gpt-oss) - (stable on 4090)

#1
by cmp-nct - opened

File "/root/.cache/huggingface/hub/models--kernels-community--triton_kernels/snapshots/1d2e9557ac0d4c651055a209055748d4db0fe65b/build/torch-universal/triton_kernels/matmul_ogs_details/opt_flags.py", line 214, in make_default_opt_flags_nvidia
assert num_stages >= 1

I had to manually comment that assertion to get it running.
Otherwise I've in 30% of batch sizes and prompt lengths AssertionError crashes with gpt-oss-20b on my 5090

On 4090 I've no such problems.

kernels-community org

Thanks for this ! Would you like to open a PR for that ? Otherwise, I will sync the latest version in a few days after it is a bit more stable on triton kernels side

Hi, I think commenting it out is not the right final solution. I'm not experienced in triton lang to fix the bug.

The problem is probably in the calculation that leads to num_stages getting assigned 0, someone with experience in triton needs to fix it.
The dev responsible for the function should take a look, maybe it's something obvious.

Sign up or log in to comment