This repo only contains the AttnGates' weights for deepseek-ai/DeepSeek-R1-Distill-Qwen-14B Model. It's only used for decoding. However, the current inference framework is mainly for accuracy tests.
Please see Github page for sparse modeling. Scripts to run the accuracy tests on math reasoning benchmark can be found in eval/reasoning_tasks.
- Downloads last month
- 236
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for SeerAttention/SeerAttention-DeepSeek-R1-Distill-Qwen-14B-Decode-AttnGates
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-14B