Do you have deepseek-r1-0528-awq plan?

#68
by oliver0102 - opened

awq version runs smoothly on 8*H20, which is the most powerful grphics card i have.

@v2ray from https://huggingface.co/cognitivecomputations will follow up with awq soon

Maybe? awq has moved to vllm-compressor, and it doesn't 100% work for moe yet,
we will give it a try

I am doing DeepSeek-R1-Zero AWQ quants now. If that will finish and work successfully I'll try to make AWQ quants of this model.

Maybe? awq has moved to vllm-compressor, and it doesn't 100% work for moe yet,
we will give it a try

I am using your DeepSeek-R1-AWQ everyday. I can run up to 220 tokens/s with 8*H20, and I didn't feel any difference compared with orignal one. AWQ is a really good quantation method. So I am looking for any AWQ version released soon after this LITTLE upgrade of R1 released. DeepSeek-R1 version is good at programing but has big hallucination issue and prevent us to serve it as base model as one of our code-related agent. This one, hopefully, resolved this issue.

@oliver0102 @erichartford

Here's an AWQ quant for the new DeepSeek R1-0528 - https://huggingface.co/adamo1139/DeepSeek-R1-0528-AWQ

Eric, thanks for your notes in the discussion here, they were really helpful.

Sign up or log in to comment