quant script
#2
by
ehartford
- opened
can you please share the exact quant script you used?
I am guessing you first converted to bf16 and then from there to w4a16?
did you use deepseek's script for the bf16 conversion?
https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/inference/fp8_cast_bf16.py
We have used the https://github.com/IST-DASLab/MoE-Quant for quantization. It will do casting into bf16 on the fly, but the code is basically DeepSeek's script you are pointing to.
ekurtic
changed discussion status to
closed