cgus
/

DeepSeek-R1-Distill-Qwen-14B-exl2

4-bit precision

Model card Files Files and versions Community

cgus commited on Jan 21

Commit

dbe0386

·

verified ·

1 Parent(s): d659990

Update README.md

Files changed (1) hide show

README.md +30 -2

README.md CHANGED Viewed

@@ -1,3 +1,31 @@
 # DeepSeek-R1
 <!-- markdownlint-disable first-line-h1 -->
 <!-- markdownlint-disable html -->
@@ -57,7 +85,7 @@ DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and
 To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
 <p align="center">
-  <img width="80%" src="figures/benchmark.jpg">
 </p>
 ## 2. Model Summary
@@ -206,4 +234,4 @@ DeepSeek-R1 series support commercial use, allow for any modifications and deriv
 ```
 ## 9. Contact
-If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).

+---
+license: apache-2.0
+base_model:
+- deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
+---
+# DeepSeek-R1-exl2
+Original model: [DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)
+Model creator: [DeepSeek](https://huggingface.co/deepseek-ai)
+## Quants
+[4bpw h6 (main)](https://huggingface.co/cgus/DeepSeek-R1-Distill-Qwen-14B-exl2/tree/main)
+[4.5bpw h6](https://huggingface.co/cgus/DeepSeek-R1-Distill-Qwen-14B-exl2/tree/4.5bpw-h6)
+[5bpw h6](https://huggingface.co/cgus/DeepSeek-R1-Distill-Qwen-14B-exl2/tree/5bpw-h6)
+[5.5bpw h6](https://huggingface.co/cgus/DeepSeek-R1-Distill-Qwen-14B-exl2/tree/5.5bpw-h6)
+[6bpw h6](https://huggingface.co/cgus/DeepSeek-R1-Distill-Qwen-14B-exl2/tree/6bpw-h6)
+[8bpw h8](https://huggingface.co/cgus/DeepSeek-R1-Distill-Qwen-14B-exl2/tree/8bpw-h8)
+## Quantization notes
+Made with Exllamav2 0.2.7 with the standard dataset.
+Exl2 quants can be used with apps that support exllamav2 library such as TabbyAPI, Text-Generation-WebUI, LoLLMs, possibly with KoboldAI (not KoboldCpp).
+Exl2 quants require a Nvidia RTX card on Windows. On Linux it's possible to use RTX cards as well as AMD ROCm cards.
+It's required for the model to fully fit the GPU VRAM for the best performance.
+On Windows it might be still usable if offloaded *slightly* by Nvidia drivers but at significant performance loss.
+On Linux or with multi-GPU setup on Windows it just going to crash with OOM since the library doesn't support offloading natively.
+If offloading is required, please use GGUF quants instead.
+On my RTX3060/12GB machine I can load this model at about 16384 context at 5bpw and with Q4 cache enabled.
+Cheers.
 # DeepSeek-R1
 <!-- markdownlint-disable first-line-h1 -->
 <!-- markdownlint-disable html -->
 To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
 <p align="center">
+  <img width="80%" src="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B/resolve/main/figures/benchmark.jpg">
 </p>
 ## 2. Model Summary
 ```
 ## 9. Contact
+If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).