nielsr HF staff commited on
Commit
f9a71c1
·
verified ·
1 Parent(s): 1d044fd

Add pipeline tag

Browse files

This PR makes sure the model can be found at https://huggingface.co/models?pipeline_tag=text-generation&sort=trending.

Files changed (1) hide show
  1. README.md +17 -18
README.md CHANGED
@@ -1,6 +1,9 @@
1
  ---
2
  library_name: transformers
 
3
  ---
 
 
4
  <!-- markdownlint-disable first-line-h1 -->
5
  <!-- markdownlint-disable html -->
6
  <!-- markdownlint-disable no-duplicate-header -->
@@ -96,12 +99,13 @@ Throughout the entire training process, we did not experience any irrecoverable
96
 
97
  | **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** |
98
  | :------------: | :------------: | :------------: | :------------: | :------------: |
99
- | DeepSeek-V3-Base | 671B | 37B | 128K | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V3-Base) |
100
- | DeepSeek-V3 | 671B | 37B | 128K | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V3) |
101
 
102
  </div>
103
 
104
- **NOTE: The total size of DeepSeek-V3 models on HuggingFace is 685B, which includes 671B of the Main Model weights and 14B of the Multi-Token Prediction (MTP) Module weights.**
 
105
 
106
  To ensure optimal performance and flexibility, we have partnered with open-source communities and hardware vendors to provide multiple ways to run the model locally. For step-by-step guidance, check out Section 6: [How_to Run_Locally](#6-how-to-run-locally).
107
 
@@ -132,7 +136,7 @@ For developers looking to dive deeper, we recommend exploring [README_WEIGHTS.md
132
  | | WinoGrande (Acc.) | 5-shot | **86.3** | 82.3 | 85.2 | 84.9 |
133
  | | RACE-Middle (Acc.) | 5-shot | 73.1 | 68.1 | **74.2** | 67.1 |
134
  | | RACE-High (Acc.) | 5-shot | 52.6 | 50.3 | **56.8** | 51.3 |
135
- | | TriviaQA (EM) | 5-shot | 80.0 | 71.9 | **82.7** | **82.9** |
136
  | | NaturalQuestions (EM) | 5-shot | 38.6 | 33.2 | **41.5** | 40.0 |
137
  | | AGIEval (Acc.) | 0-shot | 57.5 | 75.8 | 60.6 | **79.6** |
138
  | Code | HumanEval (Pass@1) | 0-shot | 43.3 | 53.0 | 54.9 | **65.2** |
@@ -154,8 +158,9 @@ For developers looking to dive deeper, we recommend exploring [README_WEIGHTS.md
154
 
155
  </div>
156
 
157
- Note: Best results are shown in bold. Scores with a gap not exceeding 0.3 are considered to be at the same level. DeepSeek-V3 achieves the best performance on most benchmarks, especially on math and code tasks.
158
- For more evaluation details, please check our paper.
 
159
 
160
  #### Context Window
161
  <p align="center">
@@ -198,15 +203,10 @@ Evaluation results on the ``Needle In A Haystack`` (NIAH) tests. DeepSeek-V3 pe
198
 
199
  Note: All models are evaluated in a configuration that limits the output length to 8K. Benchmarks containing fewer than 1000 samples are tested multiple times using varying temperature settings to derive robust final results. DeepSeek-V3 stands as the best-performing open-source model, and also exhibits competitive performance against frontier closed-source models.
200
 
201
- </div>
202
-
203
-
204
  #### Open Ended Generation Evaluation
205
 
206
  <div align="center">
207
 
208
-
209
-
210
  | Model | Arena-Hard | AlpacaEval 2.0 |
211
  |-------|------------|----------------|
212
  | DeepSeek-V2.5-0905 | 76.2 | 50.5 |
@@ -219,7 +219,6 @@ Note: All models are evaluated in a configuration that limits the output length
219
  Note: English open-ended conversation evaluations. For AlpacaEval 2.0, we use the length-controlled win rate as the metric.
220
  </div>
221
 
222
-
223
  ## 5. Chat Website & API Platform
224
  You can chat with DeepSeek-V3 on DeepSeek's official website: [chat.deepseek.com](https://chat.deepseek.com/sign_in)
225
 
@@ -233,7 +232,7 @@ DeepSeek-V3 can be deployed locally using the following hardware and open-source
233
  2. **SGLang**: Fully support the DeepSeek-V3 model in both BF16 and FP8 inference modes.
234
  3. **LMDeploy**: Enables efficient FP8 and BF16 inference for local and cloud deployment.
235
  4. **TensorRT-LLM**: Currently supports BF16 inference and INT4/8 quantization, with FP8 support coming soon.
236
- 5. **vLLM**: Support DeekSeek-V3 model with FP8 and BF16 modes for tensor parallelism and pipeline parallelism.
237
  6. **AMD GPU**: Enables running the DeepSeek-V3 model on AMD GPUs via SGLang in both BF16 and FP8 modes.
238
  7. **Huawei Ascend NPU**: Supports running DeepSeek-V3 on Huawei Ascend devices.
239
 
@@ -246,7 +245,8 @@ cd inference
246
  python fp8_cast_bf16.py --input-fp8-hf-path /path/to/fp8_weights --output-bf16-hf-path /path/to/bf16_weights
247
  ```
248
 
249
- **NOTE: Huggingface's Transformers has not been directly supported yet.**
 
250
 
251
  ### 6.1 Inference with DeepSeek-Infer Demo (example only)
252
 
@@ -269,7 +269,7 @@ Download the model weights from HuggingFace, and put them into `/path/to/DeepSee
269
 
270
  #### Model Weights Conversion
271
 
272
- Convert HuggingFace model weights to a specific format:
273
 
274
  ```shell
275
  python convert.py --hf-ckpt-path /path/to/DeepSeek-V3 --save-path /path/to/DeepSeek-V3-Demo --n-experts 256 --model-parallel 16
@@ -302,7 +302,6 @@ Here are the launch instructions from the SGLang team: https://github.com/sgl-pr
302
 
303
  For comprehensive step-by-step instructions on running DeepSeek-V3 with LMDeploy, please refer to here: https://github.com/InternLM/lmdeploy/issues/2960
304
 
305
-
306
  ### 6.4 Inference with TRT-LLM (recommended)
307
 
308
  [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) now supports the DeepSeek-V3 model, offering precision options such as BF16 and INT4/INT8 weight-only. Support for FP8 is currently in progress and will be released soon. You can access the custom branch of TRTLLM specifically for DeepSeek-V3 support through the following link to experience the new features directly: https://github.com/NVIDIA/TensorRT-LLM/tree/deepseek/examples/deepseek_v3.
@@ -318,7 +317,6 @@ In collaboration with the AMD team, we have achieved Day-One support for AMD GPU
318
  ### 6.7 Recommended Inference Functionality with Huawei Ascend NPUs
319
  The [MindIE](https://www.hiascend.com/en/software/mindie) framework from the Huawei Ascend community has successfully adapted the BF16 version of DeepSeek-V3. For step-by-step guidance on Ascend NPUs, please follow the [instructions here](https://modelers.cn/models/MindIE/deepseekv3).
320
 
321
-
322
  ## 7. License
323
  This code repository is licensed under [the MIT License](LICENSE-CODE). The use of DeepSeek-V3 Base/Chat models is subject to [the Model License](LICENSE-MODEL). DeepSeek-V3 series (including Base and Chat) supports commercial use.
324
 
@@ -336,4 +334,5 @@ This code repository is licensed under [the MIT License](LICENSE-CODE). The use
336
  ```
337
 
338
  ## 9. Contact
339
- If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).
 
 
1
  ---
2
  library_name: transformers
3
+ pipeline_tag: text-generation
4
  ---
5
+
6
+ ```markdown
7
  <!-- markdownlint-disable first-line-h1 -->
8
  <!-- markdownlint-disable html -->
9
  <!-- markdownlint-disable no-duplicate-header -->
 
99
 
100
  | **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** |
101
  | :------------: | :------------: | :------------: | :------------: | :------------: |
102
+ | DeepSeek-V3-Base | 671B | 37B | 128K | [🤗 Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-V3-Base) |
103
+ | DeepSeek-V3 | 671B | 37B | 128K | [🤗 Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-V3) |
104
 
105
  </div>
106
 
107
+ > [!NOTE]
108
+ > The total size of DeepSeek-V3 models on Hugging Face is 685B, which includes 671B of the Main Model weights and 14B of the Multi-Token Prediction (MTP) Module weights.
109
 
110
  To ensure optimal performance and flexibility, we have partnered with open-source communities and hardware vendors to provide multiple ways to run the model locally. For step-by-step guidance, check out Section 6: [How_to Run_Locally](#6-how-to-run-locally).
111
 
 
136
  | | WinoGrande (Acc.) | 5-shot | **86.3** | 82.3 | 85.2 | 84.9 |
137
  | | RACE-Middle (Acc.) | 5-shot | 73.1 | 68.1 | **74.2** | 67.1 |
138
  | | RACE-High (Acc.) | 5-shot | 52.6 | 50.3 | **56.8** | 51.3 |
139
+ | | TriviaQA (EM) | 5-shot | 80.0 | 71.9 | 82.7 | **82.9** |
140
  | | NaturalQuestions (EM) | 5-shot | 38.6 | 33.2 | **41.5** | 40.0 |
141
  | | AGIEval (Acc.) | 0-shot | 57.5 | 75.8 | 60.6 | **79.6** |
142
  | Code | HumanEval (Pass@1) | 0-shot | 43.3 | 53.0 | 54.9 | **65.2** |
 
158
 
159
  </div>
160
 
161
+ > [!NOTE]
162
+ > Best results are shown in bold. Scores with a gap not exceeding 0.3 are considered to be at the same level. DeepSeek-V3 achieves the best performance on most benchmarks, especially on math and code tasks.
163
+ > For more evaluation details, please check our paper.
164
 
165
  #### Context Window
166
  <p align="center">
 
203
 
204
  Note: All models are evaluated in a configuration that limits the output length to 8K. Benchmarks containing fewer than 1000 samples are tested multiple times using varying temperature settings to derive robust final results. DeepSeek-V3 stands as the best-performing open-source model, and also exhibits competitive performance against frontier closed-source models.
205
 
 
 
 
206
  #### Open Ended Generation Evaluation
207
 
208
  <div align="center">
209
 
 
 
210
  | Model | Arena-Hard | AlpacaEval 2.0 |
211
  |-------|------------|----------------|
212
  | DeepSeek-V2.5-0905 | 76.2 | 50.5 |
 
219
  Note: English open-ended conversation evaluations. For AlpacaEval 2.0, we use the length-controlled win rate as the metric.
220
  </div>
221
 
 
222
  ## 5. Chat Website & API Platform
223
  You can chat with DeepSeek-V3 on DeepSeek's official website: [chat.deepseek.com](https://chat.deepseek.com/sign_in)
224
 
 
232
  2. **SGLang**: Fully support the DeepSeek-V3 model in both BF16 and FP8 inference modes.
233
  3. **LMDeploy**: Enables efficient FP8 and BF16 inference for local and cloud deployment.
234
  4. **TensorRT-LLM**: Currently supports BF16 inference and INT4/8 quantization, with FP8 support coming soon.
235
+ 5. **vLLM**: Support DeepSeek-V3 model with FP8 and BF16 modes for tensor parallelism and pipeline parallelism.
236
  6. **AMD GPU**: Enables running the DeepSeek-V3 model on AMD GPUs via SGLang in both BF16 and FP8 modes.
237
  7. **Huawei Ascend NPU**: Supports running DeepSeek-V3 on Huawei Ascend devices.
238
 
 
245
  python fp8_cast_bf16.py --input-fp8-hf-path /path/to/fp8_weights --output-bf16-hf-path /path/to/bf16_weights
246
  ```
247
 
248
+ > [!NOTE]
249
+ > Hugging Face's Transformers has not been directly supported yet.
250
 
251
  ### 6.1 Inference with DeepSeek-Infer Demo (example only)
252
 
 
269
 
270
  #### Model Weights Conversion
271
 
272
+ Convert Hugging Face model weights to a specific format:
273
 
274
  ```shell
275
  python convert.py --hf-ckpt-path /path/to/DeepSeek-V3 --save-path /path/to/DeepSeek-V3-Demo --n-experts 256 --model-parallel 16
 
302
 
303
  For comprehensive step-by-step instructions on running DeepSeek-V3 with LMDeploy, please refer to here: https://github.com/InternLM/lmdeploy/issues/2960
304
 
 
305
  ### 6.4 Inference with TRT-LLM (recommended)
306
 
307
  [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) now supports the DeepSeek-V3 model, offering precision options such as BF16 and INT4/INT8 weight-only. Support for FP8 is currently in progress and will be released soon. You can access the custom branch of TRTLLM specifically for DeepSeek-V3 support through the following link to experience the new features directly: https://github.com/NVIDIA/TensorRT-LLM/tree/deepseek/examples/deepseek_v3.
 
317
  ### 6.7 Recommended Inference Functionality with Huawei Ascend NPUs
318
  The [MindIE](https://www.hiascend.com/en/software/mindie) framework from the Huawei Ascend community has successfully adapted the BF16 version of DeepSeek-V3. For step-by-step guidance on Ascend NPUs, please follow the [instructions here](https://modelers.cn/models/MindIE/deepseekv3).
319
 
 
320
  ## 7. License
321
  This code repository is licensed under [the MIT License](LICENSE-CODE). The use of DeepSeek-V3 Base/Chat models is subject to [the Model License](LICENSE-MODEL). DeepSeek-V3 series (including Base and Chat) supports commercial use.
322
 
 
334
  ```
335
 
336
  ## 9. Contact
337
+ If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).
338
+ ```