What is the difference between Qwen/Qwen3-32B-FP8 and this quatinized model?

#1
by traphix - opened

Any difference? Qwen/Qwen3-32B-FP8

big thanks for this quantization - for whatever reason i was unable to run the FP8 version provided by qwen (was crashing with

ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")

However this one runs great in vLLM.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment