How to Only compress non-shared experts within transformer blocks?

#1
by CobraMamba - opened

which lib

IST Austria Distributed Algorithms and Systems Lab org

@CobraMamba the models were produced via the code from this repository https://github.com/IST-DASLab/MoE-Quant

We published a model with all layers quantized (including non-shared experts) as well https://huggingface.co/ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g.

Sign up or log in to comment