Add files using upload-large-folder tool
Browse files- .gitattributes +1 -1
- LICENSE +21 -0
- README.md +71 -44
- config.json +6 -5
- figures/benchmark.jpg +3 -0
- generation_config.json +2 -4
.gitattributes
CHANGED
@@ -33,4 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
-
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
figures/benchmark.jpg filter=lfs diff=lfs merge=lfs -text
|
LICENSE
ADDED
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
MIT License
|
2 |
+
|
3 |
+
Copyright (c) 2023 DeepSeek
|
4 |
+
|
5 |
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6 |
+
of this software and associated documentation files (the "Software"), to deal
|
7 |
+
in the Software without restriction, including without limitation the rights
|
8 |
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9 |
+
copies of the Software, and to permit persons to whom the Software is
|
10 |
+
furnished to do so, subject to the following conditions:
|
11 |
+
|
12 |
+
The above copyright notice and this permission notice shall be included in all
|
13 |
+
copies or substantial portions of the Software.
|
14 |
+
|
15 |
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16 |
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17 |
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18 |
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19 |
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20 |
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
21 |
+
SOFTWARE.
|
README.md
CHANGED
@@ -1,52 +1,50 @@
|
|
1 |
---
|
2 |
-
|
3 |
-
language:
|
4 |
-
- en
|
5 |
-
license: llama3.3
|
6 |
library_name: transformers
|
7 |
-
tags:
|
8 |
-
- deepseek
|
9 |
-
- unsloth
|
10 |
-
- transformers
|
11 |
-
- llama
|
12 |
-
- llama-3
|
13 |
-
- meta
|
14 |
---
|
|
|
|
|
|
|
|
|
15 |
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------|
|
32 |
-
| **Llama-3.2 (3B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb) | 2.4x faster | 58% less |
|
33 |
-
| **Llama-3.2 (11B vision)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) | 2x faster | 60% less |
|
34 |
-
| **Qwen2 VL (7B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2_VL_(7B)-Vision.ipynb) | 1.8x faster | 60% less |
|
35 |
-
| **Qwen2.5 (7B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_(7B)-Alpaca.ipynb) | 2x faster | 60% less |
|
36 |
-
| **Llama-3.1 (8B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb) | 2.4x faster | 58% less |
|
37 |
-
| **Phi-3.5 (mini)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_3.5_Mini-Conversational.ipynb) | 2x faster | 50% less |
|
38 |
-
| **Gemma 2 (9B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma2_(9B)-Alpaca.ipynb) | 2.4x faster | 58% less |
|
39 |
-
| **Mistral (7B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_(7B)-Conversational.ipynb) | 2.2x faster | 62% less |
|
40 |
|
41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
|
43 |
-
|
44 |
-
|
45 |
-
|
|
|
|
|
46 |
|
47 |
-
## Special Thanks
|
48 |
-
A huge thank you to the DeepSeek team for creating and releasing these models.
|
49 |
|
|
|
|
|
|
|
50 |
|
51 |
|
52 |
## 1. Introduction
|
@@ -59,6 +57,8 @@ we introduce DeepSeek-R1, which incorporates cold-start data before RL.
|
|
59 |
DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
|
60 |
To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
|
61 |
|
|
|
|
|
62 |
<p align="center">
|
63 |
<img width="80%" src="figures/benchmark.jpg">
|
64 |
</p>
|
@@ -95,7 +95,7 @@ To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSe
|
|
95 |
</div>
|
96 |
|
97 |
DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base.
|
98 |
-
For more details
|
99 |
|
100 |
### DeepSeek-R1-Distill Models
|
101 |
|
@@ -184,6 +184,8 @@ We also provide OpenAI-Compatible API at DeepSeek Platform: [platform.deepseek.c
|
|
184 |
|
185 |
Please visit [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repo for more information about running DeepSeek-R1 locally.
|
186 |
|
|
|
|
|
187 |
### DeepSeek-R1-Distill Models
|
188 |
|
189 |
DeepSeek-R1-Distill models can be utilized in the same manner as Qwen or Llama models.
|
@@ -194,7 +196,23 @@ For instance, you can easily start a service using [vLLM](https://github.com/vll
|
|
194 |
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager
|
195 |
```
|
196 |
|
197 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
198 |
|
199 |
## 7. License
|
200 |
This code repository and the model weights are licensed under the [MIT License](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/LICENSE).
|
@@ -205,8 +223,17 @@ DeepSeek-R1 series support commercial use, allow for any modifications and deriv
|
|
205 |
|
206 |
## 8. Citation
|
207 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
208 |
|
209 |
```
|
210 |
|
211 |
## 9. Contact
|
212 |
-
If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).
|
|
|
1 |
---
|
2 |
+
license: mit
|
|
|
|
|
|
|
3 |
library_name: transformers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
+
# DeepSeek-R1
|
6 |
+
<!-- markdownlint-disable first-line-h1 -->
|
7 |
+
<!-- markdownlint-disable html -->
|
8 |
+
<!-- markdownlint-disable no-duplicate-header -->
|
9 |
|
10 |
+
<div align="center">
|
11 |
+
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" />
|
12 |
+
</div>
|
13 |
+
<hr>
|
14 |
+
<div align="center" style="line-height: 1;">
|
15 |
+
<a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
|
16 |
+
<img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;"/>
|
17 |
+
</a>
|
18 |
+
<a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
|
19 |
+
<img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-DeepSeek%20R1-536af5?color=536af5&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
20 |
+
</a>
|
21 |
+
<a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
|
22 |
+
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
23 |
+
</a>
|
24 |
+
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
+
<div align="center" style="line-height: 1;">
|
27 |
+
<a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
|
28 |
+
<img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da" style="display: inline-block; vertical-align: middle;"/>
|
29 |
+
</a>
|
30 |
+
<a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
|
31 |
+
<img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
32 |
+
</a>
|
33 |
+
<a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
|
34 |
+
<img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
35 |
+
</a>
|
36 |
+
</div>
|
37 |
|
38 |
+
<div align="center" style="line-height: 1;">
|
39 |
+
<a href="https://github.com/deepseek-ai/DeepSeek-R1/blob/main/LICENSE" style="margin: 2px;">
|
40 |
+
<img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
|
41 |
+
</a>
|
42 |
+
</div>
|
43 |
|
|
|
|
|
44 |
|
45 |
+
<p align="center">
|
46 |
+
<a href="https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf"><b>Paper Link</b>👁️</a>
|
47 |
+
</p>
|
48 |
|
49 |
|
50 |
## 1. Introduction
|
|
|
57 |
DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
|
58 |
To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
|
59 |
|
60 |
+
**NOTE: Before running DeepSeek-R1 series models locally, we kindly recommend reviewing the [Usage Recommendation](#usage-recommendations) section.**
|
61 |
+
|
62 |
<p align="center">
|
63 |
<img width="80%" src="figures/benchmark.jpg">
|
64 |
</p>
|
|
|
95 |
</div>
|
96 |
|
97 |
DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base.
|
98 |
+
For more details regarding the model architecture, please refer to [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repository.
|
99 |
|
100 |
### DeepSeek-R1-Distill Models
|
101 |
|
|
|
184 |
|
185 |
Please visit [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repo for more information about running DeepSeek-R1 locally.
|
186 |
|
187 |
+
**NOTE: Hugging Face's Transformers has not been directly supported yet.**
|
188 |
+
|
189 |
### DeepSeek-R1-Distill Models
|
190 |
|
191 |
DeepSeek-R1-Distill models can be utilized in the same manner as Qwen or Llama models.
|
|
|
196 |
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager
|
197 |
```
|
198 |
|
199 |
+
You can also easily start a service using [SGLang](https://github.com/sgl-project/sglang)
|
200 |
+
|
201 |
+
```bash
|
202 |
+
python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --trust-remote-code --tp 2
|
203 |
+
```
|
204 |
+
|
205 |
+
### Usage Recommendations
|
206 |
+
|
207 |
+
**We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:**
|
208 |
+
|
209 |
+
1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
|
210 |
+
2. **Avoid adding a system prompt; all instructions should be contained within the user prompt.**
|
211 |
+
3. For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}."
|
212 |
+
4. When evaluating model performance, it is recommended to conduct multiple tests and average the results.
|
213 |
+
|
214 |
+
Additionally, we have observed that the DeepSeek-R1 series models tend to bypass thinking pattern (i.e., outputting "\<think\>\n\n\</think\>") when responding to certain queries, which can adversely affect the model's performance.
|
215 |
+
**To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with "\<think\>\n" at the beginning of every output.**
|
216 |
|
217 |
## 7. License
|
218 |
This code repository and the model weights are licensed under the [MIT License](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/LICENSE).
|
|
|
223 |
|
224 |
## 8. Citation
|
225 |
```
|
226 |
+
@misc{deepseekai2025deepseekr1incentivizingreasoningcapability,
|
227 |
+
title={DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning},
|
228 |
+
author={DeepSeek-AI},
|
229 |
+
year={2025},
|
230 |
+
eprint={2501.12948},
|
231 |
+
archivePrefix={arXiv},
|
232 |
+
primaryClass={cs.CL},
|
233 |
+
url={https://arxiv.org/abs/2501.12948},
|
234 |
+
}
|
235 |
|
236 |
```
|
237 |
|
238 |
## 9. Contact
|
239 |
+
If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).
|
config.json
CHANGED
@@ -1,12 +1,15 @@
|
|
1 |
{
|
2 |
-
"_name_or_path": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
|
3 |
"architectures": [
|
4 |
"LlamaForCausalLM"
|
5 |
],
|
6 |
"attention_bias": false,
|
7 |
"attention_dropout": 0.0,
|
8 |
"bos_token_id": 128000,
|
9 |
-
"eos_token_id":
|
|
|
|
|
|
|
|
|
10 |
"head_dim": 128,
|
11 |
"hidden_act": "silu",
|
12 |
"hidden_size": 8192,
|
@@ -18,7 +21,6 @@
|
|
18 |
"num_attention_heads": 64,
|
19 |
"num_hidden_layers": 80,
|
20 |
"num_key_value_heads": 8,
|
21 |
-
"pad_token_id": 128004,
|
22 |
"pretraining_tp": 1,
|
23 |
"rms_norm_eps": 1e-05,
|
24 |
"rope_scaling": {
|
@@ -31,8 +33,7 @@
|
|
31 |
"rope_theta": 500000.0,
|
32 |
"tie_word_embeddings": false,
|
33 |
"torch_dtype": "bfloat16",
|
34 |
-
"transformers_version": "4.
|
35 |
-
"unsloth_fixed": true,
|
36 |
"use_cache": true,
|
37 |
"vocab_size": 128256
|
38 |
}
|
|
|
1 |
{
|
|
|
2 |
"architectures": [
|
3 |
"LlamaForCausalLM"
|
4 |
],
|
5 |
"attention_bias": false,
|
6 |
"attention_dropout": 0.0,
|
7 |
"bos_token_id": 128000,
|
8 |
+
"eos_token_id": [
|
9 |
+
128001,
|
10 |
+
128008,
|
11 |
+
128009
|
12 |
+
],
|
13 |
"head_dim": 128,
|
14 |
"hidden_act": "silu",
|
15 |
"hidden_size": 8192,
|
|
|
21 |
"num_attention_heads": 64,
|
22 |
"num_hidden_layers": 80,
|
23 |
"num_key_value_heads": 8,
|
|
|
24 |
"pretraining_tp": 1,
|
25 |
"rms_norm_eps": 1e-05,
|
26 |
"rope_scaling": {
|
|
|
33 |
"rope_theta": 500000.0,
|
34 |
"tie_word_embeddings": false,
|
35 |
"torch_dtype": "bfloat16",
|
36 |
+
"transformers_version": "4.47.0.dev0",
|
|
|
37 |
"use_cache": true,
|
38 |
"vocab_size": 128256
|
39 |
}
|
figures/benchmark.jpg
ADDED
![]() |
Git LFS Details
|
generation_config.json
CHANGED
@@ -1,11 +1,9 @@
|
|
1 |
{
|
2 |
"_from_model_config": true,
|
3 |
"bos_token_id": 128000,
|
4 |
-
"do_sample": true,
|
5 |
"eos_token_id": 128001,
|
6 |
-
"
|
7 |
-
"pad_token_id": 128004,
|
8 |
"temperature": 0.6,
|
9 |
"top_p": 0.95,
|
10 |
-
"transformers_version": "4.
|
11 |
}
|
|
|
1 |
{
|
2 |
"_from_model_config": true,
|
3 |
"bos_token_id": 128000,
|
|
|
4 |
"eos_token_id": 128001,
|
5 |
+
"do_sample": true,
|
|
|
6 |
"temperature": 0.6,
|
7 |
"top_p": 0.95,
|
8 |
+
"transformers_version": "4.39.3"
|
9 |
}
|