Is this a QAT model?

by Downtown-Case - opened Apr 28

Apr 28

•

Is this (and the other series of Qwen FP8 models) natively trained with FP8, or enhanced with quantization-aware training?

In other words, would we get the same results locally converting the Bfloat16 model to FP8? Or is this one more optimized somehow?

alexmarques

May 6

This model is not a QAT model. It was quantized purely using post-training quantization techniques. It uses MSE algorithm to determine the quantization scales for weights and defines the quantization config as needed for vLLM to quantize activations dynamically. llm-compressor makes it simple to obtain this quantized model, and you should be able to obtain the same model if you use this tool locally.

JaheimLee

May 13

•

edited May 13

This model is not a QAT model. It was quantized purely using post-training quantization techniques. It uses MSE algorithm to determine the quantization scales for weights and defines the quantization config as needed for vLLM to quantize activations dynamically. llm-compressor makes it simple to obtain this quantized model, and you should be able to obtain the same model if you use this tool locally.

Could you provide the performance like AWQ versions do?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment