This repo only contains the AttnGates' weights for deepseek-ai/DeepSeek-R1-Distill-Qwen-14B Model. It's only used for decoding. However, the current inference framework is mainly for accuracy tests.

Please see Github page for sparse modeling. Scripts to run the accuracy tests on math reasoning benchmark can be found in eval/reasoning_tasks.

Downloads last month
236
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for SeerAttention/SeerAttention-DeepSeek-R1-Distill-Qwen-14B-Decode-AttnGates

Adapter
(21)
this model