quant script

#2
by ehartford - opened

can you please share the exact quant script you used?

I am guessing you first converted to bf16 and then from there to w4a16?

did you use deepseek's script for the bf16 conversion?

https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/inference/fp8_cast_bf16.py

Red Hat AI org

We have used the https://github.com/IST-DASLab/MoE-Quant for quantization. It will do casting into bf16 on the fly, but the code is basically DeepSeek's script you are pointing to.

ekurtic changed discussion status to closed

Sign up or log in to comment