How to Only compress non-shared experts within transformer blocks?

by CobraMamba - opened May 8

May 8

which lib

IST Austria Distributed Algorithms and Systems Lab org Jun 1

@CobraMamba the models were produced via the code from this repository https://github.com/IST-DASLab/MoE-Quant

We published a model with all layers quantized (including non-shared experts) as well https://huggingface.co/ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment