Kernel assertion errors on 5090 using generation with MXfp4 (gpt-oss) - (stable on 4090)
File "/root/.cache/huggingface/hub/models--kernels-community--triton_kernels/snapshots/1d2e9557ac0d4c651055a209055748d4db0fe65b/build/torch-universal/triton_kernels/matmul_ogs_details/opt_flags.py", line 214, in make_default_opt_flags_nvidia
assert num_stages >= 1
I had to manually comment that assertion to get it running.
Otherwise I've in 30% of batch sizes and prompt lengths AssertionError crashes with gpt-oss-20b on my 5090
On 4090 I've no such problems.
Thanks for this ! Would you like to open a PR for that ? Otherwise, I will sync the latest version in a few days after it is a bit more stable on triton kernels side
Hi, I think commenting it out is not the right final solution. I'm not experienced in triton lang to fix the bug.
The problem is probably in the calculation that leads to num_stages getting assigned 0, someone with experience in triton needs to fix it.
The dev responsible for the function should take a look, maybe it's something obvious.