cgus commited on
Commit
dbe0386
·
verified ·
1 Parent(s): d659990

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -2
README.md CHANGED
@@ -1,3 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # DeepSeek-R1
2
  <!-- markdownlint-disable first-line-h1 -->
3
  <!-- markdownlint-disable html -->
@@ -57,7 +85,7 @@ DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and
57
  To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
58
 
59
  <p align="center">
60
- <img width="80%" src="figures/benchmark.jpg">
61
  </p>
62
 
63
  ## 2. Model Summary
@@ -206,4 +234,4 @@ DeepSeek-R1 series support commercial use, allow for any modifications and deriv
206
  ```
207
 
208
  ## 9. Contact
209
- If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
5
+ ---
6
+ # DeepSeek-R1-exl2
7
+ Original model: [DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)
8
+ Model creator: [DeepSeek](https://huggingface.co/deepseek-ai)
9
+
10
+ ## Quants
11
+ [4bpw h6 (main)](https://huggingface.co/cgus/DeepSeek-R1-Distill-Qwen-14B-exl2/tree/main)
12
+ [4.5bpw h6](https://huggingface.co/cgus/DeepSeek-R1-Distill-Qwen-14B-exl2/tree/4.5bpw-h6)
13
+ [5bpw h6](https://huggingface.co/cgus/DeepSeek-R1-Distill-Qwen-14B-exl2/tree/5bpw-h6)
14
+ [5.5bpw h6](https://huggingface.co/cgus/DeepSeek-R1-Distill-Qwen-14B-exl2/tree/5.5bpw-h6)
15
+ [6bpw h6](https://huggingface.co/cgus/DeepSeek-R1-Distill-Qwen-14B-exl2/tree/6bpw-h6)
16
+ [8bpw h8](https://huggingface.co/cgus/DeepSeek-R1-Distill-Qwen-14B-exl2/tree/8bpw-h8)
17
+
18
+ ## Quantization notes
19
+ Made with Exllamav2 0.2.7 with the standard dataset.
20
+ Exl2 quants can be used with apps that support exllamav2 library such as TabbyAPI, Text-Generation-WebUI, LoLLMs, possibly with KoboldAI (not KoboldCpp).
21
+ Exl2 quants require a Nvidia RTX card on Windows. On Linux it's possible to use RTX cards as well as AMD ROCm cards.
22
+ It's required for the model to fully fit the GPU VRAM for the best performance.
23
+ On Windows it might be still usable if offloaded *slightly* by Nvidia drivers but at significant performance loss.
24
+ On Linux or with multi-GPU setup on Windows it just going to crash with OOM since the library doesn't support offloading natively.
25
+ If offloading is required, please use GGUF quants instead.
26
+ On my RTX3060/12GB machine I can load this model at about 16384 context at 5bpw and with Q4 cache enabled.
27
+ Cheers.
28
+
29
  # DeepSeek-R1
30
  <!-- markdownlint-disable first-line-h1 -->
31
  <!-- markdownlint-disable html -->
 
85
  To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
86
 
87
  <p align="center">
88
+ <img width="80%" src="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B/resolve/main/figures/benchmark.jpg">
89
  </p>
90
 
91
  ## 2. Model Summary
 
234
  ```
235
 
236
  ## 9. Contact
237
+ If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).