Is this a QAT model?

#2
by Downtown-Case - opened

Is this (and the other series of Qwen FP8 models) natively trained with FP8, or enhanced with quantization-aware training?

In other words, would we get the same results locally converting the Bfloat16 model to FP8? Or is this one more optimized somehow?

This model is not a QAT model. It was quantized purely using post-training quantization techniques. It uses MSE algorithm to determine the quantization scales for weights and defines the quantization config as needed for vLLM to quantize activations dynamically. llm-compressor makes it simple to obtain this quantized model, and you should be able to obtain the same model if you use this tool locally.

This model is not a QAT model. It was quantized purely using post-training quantization techniques. It uses MSE algorithm to determine the quantization scales for weights and defines the quantization config as needed for vLLM to quantize activations dynamically. llm-compressor makes it simple to obtain this quantized model, and you should be able to obtain the same model if you use this tool locally.

Could you provide the performance like AWQ versions do?

Sign up or log in to comment