This repo only contains the AttnGates' weights for deepseek-ai/DeepSeek-R1-Distill-Qwen-14B Model. It's only used for decoding. However, the current inference framework is mainly for accuracy tests.

Please see Github page for sparse modeling. Scripts to run the accuracy tests on math reasoning benchmark can be found in eval/reasoning_tasks.

Downloads last month: 236

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SeerAttention/SeerAttention-DeepSeek-R1-Distill-Qwen-14B-Decode-AttnGates

Base model

deepseek-ai/DeepSeek-R1-Distill-Qwen-14B

Adapter

(21)

this model