Add pipeline tag
#86
by
nielsr
HF staff
- opened
README.md
CHANGED
@@ -1,6 +1,9 @@
|
|
1 |
---
|
2 |
library_name: transformers
|
|
|
3 |
---
|
|
|
|
|
4 |
<!-- markdownlint-disable first-line-h1 -->
|
5 |
<!-- markdownlint-disable html -->
|
6 |
<!-- markdownlint-disable no-duplicate-header -->
|
@@ -96,12 +99,13 @@ Throughout the entire training process, we did not experience any irrecoverable
|
|
96 |
|
97 |
| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** |
|
98 |
| :------------: | :------------: | :------------: | :------------: | :------------: |
|
99 |
-
| DeepSeek-V3-Base | 671B | 37B | 128K | [🤗
|
100 |
-
| DeepSeek-V3 | 671B | 37B | 128K | [🤗
|
101 |
|
102 |
</div>
|
103 |
|
104 |
-
|
|
|
105 |
|
106 |
To ensure optimal performance and flexibility, we have partnered with open-source communities and hardware vendors to provide multiple ways to run the model locally. For step-by-step guidance, check out Section 6: [How_to Run_Locally](#6-how-to-run-locally).
|
107 |
|
@@ -132,7 +136,7 @@ For developers looking to dive deeper, we recommend exploring [README_WEIGHTS.md
|
|
132 |
| | WinoGrande (Acc.) | 5-shot | **86.3** | 82.3 | 85.2 | 84.9 |
|
133 |
| | RACE-Middle (Acc.) | 5-shot | 73.1 | 68.1 | **74.2** | 67.1 |
|
134 |
| | RACE-High (Acc.) | 5-shot | 52.6 | 50.3 | **56.8** | 51.3 |
|
135 |
-
| | TriviaQA (EM) | 5-shot | 80.0 | 71.9 |
|
136 |
| | NaturalQuestions (EM) | 5-shot | 38.6 | 33.2 | **41.5** | 40.0 |
|
137 |
| | AGIEval (Acc.) | 0-shot | 57.5 | 75.8 | 60.6 | **79.6** |
|
138 |
| Code | HumanEval (Pass@1) | 0-shot | 43.3 | 53.0 | 54.9 | **65.2** |
|
@@ -154,8 +158,9 @@ For developers looking to dive deeper, we recommend exploring [README_WEIGHTS.md
|
|
154 |
|
155 |
</div>
|
156 |
|
157 |
-
|
158 |
-
|
|
|
159 |
|
160 |
#### Context Window
|
161 |
<p align="center">
|
@@ -198,15 +203,10 @@ Evaluation results on the ``Needle In A Haystack`` (NIAH) tests. DeepSeek-V3 pe
|
|
198 |
|
199 |
Note: All models are evaluated in a configuration that limits the output length to 8K. Benchmarks containing fewer than 1000 samples are tested multiple times using varying temperature settings to derive robust final results. DeepSeek-V3 stands as the best-performing open-source model, and also exhibits competitive performance against frontier closed-source models.
|
200 |
|
201 |
-
</div>
|
202 |
-
|
203 |
-
|
204 |
#### Open Ended Generation Evaluation
|
205 |
|
206 |
<div align="center">
|
207 |
|
208 |
-
|
209 |
-
|
210 |
| Model | Arena-Hard | AlpacaEval 2.0 |
|
211 |
|-------|------------|----------------|
|
212 |
| DeepSeek-V2.5-0905 | 76.2 | 50.5 |
|
@@ -219,7 +219,6 @@ Note: All models are evaluated in a configuration that limits the output length
|
|
219 |
Note: English open-ended conversation evaluations. For AlpacaEval 2.0, we use the length-controlled win rate as the metric.
|
220 |
</div>
|
221 |
|
222 |
-
|
223 |
## 5. Chat Website & API Platform
|
224 |
You can chat with DeepSeek-V3 on DeepSeek's official website: [chat.deepseek.com](https://chat.deepseek.com/sign_in)
|
225 |
|
@@ -233,7 +232,7 @@ DeepSeek-V3 can be deployed locally using the following hardware and open-source
|
|
233 |
2. **SGLang**: Fully support the DeepSeek-V3 model in both BF16 and FP8 inference modes.
|
234 |
3. **LMDeploy**: Enables efficient FP8 and BF16 inference for local and cloud deployment.
|
235 |
4. **TensorRT-LLM**: Currently supports BF16 inference and INT4/8 quantization, with FP8 support coming soon.
|
236 |
-
5. **vLLM**: Support
|
237 |
6. **AMD GPU**: Enables running the DeepSeek-V3 model on AMD GPUs via SGLang in both BF16 and FP8 modes.
|
238 |
7. **Huawei Ascend NPU**: Supports running DeepSeek-V3 on Huawei Ascend devices.
|
239 |
|
@@ -246,7 +245,8 @@ cd inference
|
|
246 |
python fp8_cast_bf16.py --input-fp8-hf-path /path/to/fp8_weights --output-bf16-hf-path /path/to/bf16_weights
|
247 |
```
|
248 |
|
249 |
-
|
|
|
250 |
|
251 |
### 6.1 Inference with DeepSeek-Infer Demo (example only)
|
252 |
|
@@ -269,7 +269,7 @@ Download the model weights from HuggingFace, and put them into `/path/to/DeepSee
|
|
269 |
|
270 |
#### Model Weights Conversion
|
271 |
|
272 |
-
Convert
|
273 |
|
274 |
```shell
|
275 |
python convert.py --hf-ckpt-path /path/to/DeepSeek-V3 --save-path /path/to/DeepSeek-V3-Demo --n-experts 256 --model-parallel 16
|
@@ -302,7 +302,6 @@ Here are the launch instructions from the SGLang team: https://github.com/sgl-pr
|
|
302 |
|
303 |
For comprehensive step-by-step instructions on running DeepSeek-V3 with LMDeploy, please refer to here: https://github.com/InternLM/lmdeploy/issues/2960
|
304 |
|
305 |
-
|
306 |
### 6.4 Inference with TRT-LLM (recommended)
|
307 |
|
308 |
[TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) now supports the DeepSeek-V3 model, offering precision options such as BF16 and INT4/INT8 weight-only. Support for FP8 is currently in progress and will be released soon. You can access the custom branch of TRTLLM specifically for DeepSeek-V3 support through the following link to experience the new features directly: https://github.com/NVIDIA/TensorRT-LLM/tree/deepseek/examples/deepseek_v3.
|
@@ -318,7 +317,6 @@ In collaboration with the AMD team, we have achieved Day-One support for AMD GPU
|
|
318 |
### 6.7 Recommended Inference Functionality with Huawei Ascend NPUs
|
319 |
The [MindIE](https://www.hiascend.com/en/software/mindie) framework from the Huawei Ascend community has successfully adapted the BF16 version of DeepSeek-V3. For step-by-step guidance on Ascend NPUs, please follow the [instructions here](https://modelers.cn/models/MindIE/deepseekv3).
|
320 |
|
321 |
-
|
322 |
## 7. License
|
323 |
This code repository is licensed under [the MIT License](LICENSE-CODE). The use of DeepSeek-V3 Base/Chat models is subject to [the Model License](LICENSE-MODEL). DeepSeek-V3 series (including Base and Chat) supports commercial use.
|
324 |
|
@@ -336,4 +334,5 @@ This code repository is licensed under [the MIT License](LICENSE-CODE). The use
|
|
336 |
```
|
337 |
|
338 |
## 9. Contact
|
339 |
-
If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).
|
|
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
+
pipeline_tag: text-generation
|
4 |
---
|
5 |
+
|
6 |
+
```markdown
|
7 |
<!-- markdownlint-disable first-line-h1 -->
|
8 |
<!-- markdownlint-disable html -->
|
9 |
<!-- markdownlint-disable no-duplicate-header -->
|
|
|
99 |
|
100 |
| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** |
|
101 |
| :------------: | :------------: | :------------: | :------------: | :------------: |
|
102 |
+
| DeepSeek-V3-Base | 671B | 37B | 128K | [🤗 Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-V3-Base) |
|
103 |
+
| DeepSeek-V3 | 671B | 37B | 128K | [🤗 Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-V3) |
|
104 |
|
105 |
</div>
|
106 |
|
107 |
+
> [!NOTE]
|
108 |
+
> The total size of DeepSeek-V3 models on Hugging Face is 685B, which includes 671B of the Main Model weights and 14B of the Multi-Token Prediction (MTP) Module weights.
|
109 |
|
110 |
To ensure optimal performance and flexibility, we have partnered with open-source communities and hardware vendors to provide multiple ways to run the model locally. For step-by-step guidance, check out Section 6: [How_to Run_Locally](#6-how-to-run-locally).
|
111 |
|
|
|
136 |
| | WinoGrande (Acc.) | 5-shot | **86.3** | 82.3 | 85.2 | 84.9 |
|
137 |
| | RACE-Middle (Acc.) | 5-shot | 73.1 | 68.1 | **74.2** | 67.1 |
|
138 |
| | RACE-High (Acc.) | 5-shot | 52.6 | 50.3 | **56.8** | 51.3 |
|
139 |
+
| | TriviaQA (EM) | 5-shot | 80.0 | 71.9 | 82.7 | **82.9** |
|
140 |
| | NaturalQuestions (EM) | 5-shot | 38.6 | 33.2 | **41.5** | 40.0 |
|
141 |
| | AGIEval (Acc.) | 0-shot | 57.5 | 75.8 | 60.6 | **79.6** |
|
142 |
| Code | HumanEval (Pass@1) | 0-shot | 43.3 | 53.0 | 54.9 | **65.2** |
|
|
|
158 |
|
159 |
</div>
|
160 |
|
161 |
+
> [!NOTE]
|
162 |
+
> Best results are shown in bold. Scores with a gap not exceeding 0.3 are considered to be at the same level. DeepSeek-V3 achieves the best performance on most benchmarks, especially on math and code tasks.
|
163 |
+
> For more evaluation details, please check our paper.
|
164 |
|
165 |
#### Context Window
|
166 |
<p align="center">
|
|
|
203 |
|
204 |
Note: All models are evaluated in a configuration that limits the output length to 8K. Benchmarks containing fewer than 1000 samples are tested multiple times using varying temperature settings to derive robust final results. DeepSeek-V3 stands as the best-performing open-source model, and also exhibits competitive performance against frontier closed-source models.
|
205 |
|
|
|
|
|
|
|
206 |
#### Open Ended Generation Evaluation
|
207 |
|
208 |
<div align="center">
|
209 |
|
|
|
|
|
210 |
| Model | Arena-Hard | AlpacaEval 2.0 |
|
211 |
|-------|------------|----------------|
|
212 |
| DeepSeek-V2.5-0905 | 76.2 | 50.5 |
|
|
|
219 |
Note: English open-ended conversation evaluations. For AlpacaEval 2.0, we use the length-controlled win rate as the metric.
|
220 |
</div>
|
221 |
|
|
|
222 |
## 5. Chat Website & API Platform
|
223 |
You can chat with DeepSeek-V3 on DeepSeek's official website: [chat.deepseek.com](https://chat.deepseek.com/sign_in)
|
224 |
|
|
|
232 |
2. **SGLang**: Fully support the DeepSeek-V3 model in both BF16 and FP8 inference modes.
|
233 |
3. **LMDeploy**: Enables efficient FP8 and BF16 inference for local and cloud deployment.
|
234 |
4. **TensorRT-LLM**: Currently supports BF16 inference and INT4/8 quantization, with FP8 support coming soon.
|
235 |
+
5. **vLLM**: Support DeepSeek-V3 model with FP8 and BF16 modes for tensor parallelism and pipeline parallelism.
|
236 |
6. **AMD GPU**: Enables running the DeepSeek-V3 model on AMD GPUs via SGLang in both BF16 and FP8 modes.
|
237 |
7. **Huawei Ascend NPU**: Supports running DeepSeek-V3 on Huawei Ascend devices.
|
238 |
|
|
|
245 |
python fp8_cast_bf16.py --input-fp8-hf-path /path/to/fp8_weights --output-bf16-hf-path /path/to/bf16_weights
|
246 |
```
|
247 |
|
248 |
+
> [!NOTE]
|
249 |
+
> Hugging Face's Transformers has not been directly supported yet.
|
250 |
|
251 |
### 6.1 Inference with DeepSeek-Infer Demo (example only)
|
252 |
|
|
|
269 |
|
270 |
#### Model Weights Conversion
|
271 |
|
272 |
+
Convert Hugging Face model weights to a specific format:
|
273 |
|
274 |
```shell
|
275 |
python convert.py --hf-ckpt-path /path/to/DeepSeek-V3 --save-path /path/to/DeepSeek-V3-Demo --n-experts 256 --model-parallel 16
|
|
|
302 |
|
303 |
For comprehensive step-by-step instructions on running DeepSeek-V3 with LMDeploy, please refer to here: https://github.com/InternLM/lmdeploy/issues/2960
|
304 |
|
|
|
305 |
### 6.4 Inference with TRT-LLM (recommended)
|
306 |
|
307 |
[TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) now supports the DeepSeek-V3 model, offering precision options such as BF16 and INT4/INT8 weight-only. Support for FP8 is currently in progress and will be released soon. You can access the custom branch of TRTLLM specifically for DeepSeek-V3 support through the following link to experience the new features directly: https://github.com/NVIDIA/TensorRT-LLM/tree/deepseek/examples/deepseek_v3.
|
|
|
317 |
### 6.7 Recommended Inference Functionality with Huawei Ascend NPUs
|
318 |
The [MindIE](https://www.hiascend.com/en/software/mindie) framework from the Huawei Ascend community has successfully adapted the BF16 version of DeepSeek-V3. For step-by-step guidance on Ascend NPUs, please follow the [instructions here](https://modelers.cn/models/MindIE/deepseekv3).
|
319 |
|
|
|
320 |
## 7. License
|
321 |
This code repository is licensed under [the MIT License](LICENSE-CODE). The use of DeepSeek-V3 Base/Chat models is subject to [the Model License](LICENSE-MODEL). DeepSeek-V3 series (including Base and Chat) supports commercial use.
|
322 |
|
|
|
334 |
```
|
335 |
|
336 |
## 9. Contact
|
337 |
+
If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).
|
338 |
+
```
|