Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# DeepSeek-R1
|
2 |
<!-- markdownlint-disable first-line-h1 -->
|
3 |
<!-- markdownlint-disable html -->
|
@@ -57,7 +85,7 @@ DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and
|
|
57 |
To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
|
58 |
|
59 |
<p align="center">
|
60 |
-
<img width="80%" src="figures/benchmark.jpg">
|
61 |
</p>
|
62 |
|
63 |
## 2. Model Summary
|
@@ -206,4 +234,4 @@ DeepSeek-R1 series support commercial use, allow for any modifications and deriv
|
|
206 |
```
|
207 |
|
208 |
## 9. Contact
|
209 |
-
If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
base_model:
|
4 |
+
- deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
|
5 |
+
---
|
6 |
+
# DeepSeek-R1-exl2
|
7 |
+
Original model: [DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)
|
8 |
+
Model creator: [DeepSeek](https://huggingface.co/deepseek-ai)
|
9 |
+
|
10 |
+
## Quants
|
11 |
+
[4bpw h6 (main)](https://huggingface.co/cgus/DeepSeek-R1-Distill-Qwen-14B-exl2/tree/main)
|
12 |
+
[4.5bpw h6](https://huggingface.co/cgus/DeepSeek-R1-Distill-Qwen-14B-exl2/tree/4.5bpw-h6)
|
13 |
+
[5bpw h6](https://huggingface.co/cgus/DeepSeek-R1-Distill-Qwen-14B-exl2/tree/5bpw-h6)
|
14 |
+
[5.5bpw h6](https://huggingface.co/cgus/DeepSeek-R1-Distill-Qwen-14B-exl2/tree/5.5bpw-h6)
|
15 |
+
[6bpw h6](https://huggingface.co/cgus/DeepSeek-R1-Distill-Qwen-14B-exl2/tree/6bpw-h6)
|
16 |
+
[8bpw h8](https://huggingface.co/cgus/DeepSeek-R1-Distill-Qwen-14B-exl2/tree/8bpw-h8)
|
17 |
+
|
18 |
+
## Quantization notes
|
19 |
+
Made with Exllamav2 0.2.7 with the standard dataset.
|
20 |
+
Exl2 quants can be used with apps that support exllamav2 library such as TabbyAPI, Text-Generation-WebUI, LoLLMs, possibly with KoboldAI (not KoboldCpp).
|
21 |
+
Exl2 quants require a Nvidia RTX card on Windows. On Linux it's possible to use RTX cards as well as AMD ROCm cards.
|
22 |
+
It's required for the model to fully fit the GPU VRAM for the best performance.
|
23 |
+
On Windows it might be still usable if offloaded *slightly* by Nvidia drivers but at significant performance loss.
|
24 |
+
On Linux or with multi-GPU setup on Windows it just going to crash with OOM since the library doesn't support offloading natively.
|
25 |
+
If offloading is required, please use GGUF quants instead.
|
26 |
+
On my RTX3060/12GB machine I can load this model at about 16384 context at 5bpw and with Q4 cache enabled.
|
27 |
+
Cheers.
|
28 |
+
|
29 |
# DeepSeek-R1
|
30 |
<!-- markdownlint-disable first-line-h1 -->
|
31 |
<!-- markdownlint-disable html -->
|
|
|
85 |
To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
|
86 |
|
87 |
<p align="center">
|
88 |
+
<img width="80%" src="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B/resolve/main/figures/benchmark.jpg">
|
89 |
</p>
|
90 |
|
91 |
## 2. Model Summary
|
|
|
234 |
```
|
235 |
|
236 |
## 9. Contact
|
237 |
+
If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).
|