What is the difference between Qwen/Qwen3-32B-FP8 and this quatinized model?
#1
by
traphix
- opened
big thanks for this quantization - for whatever reason i was unable to run the FP8 version provided by qwen (was crashing with
ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
However this one runs great in vLLM.