How to Only compress non-shared experts within transformer blocks?
#1
by
CobraMamba
- opened
which lib
@CobraMamba the models were produced via the code from this repository https://github.com/IST-DASLab/MoE-Quant
We published a model with all layers quantized (including non-shared experts) as well https://huggingface.co/ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g.