luofuli commited on
Commit
e91ebd8
1 Parent(s): 4621eeb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -71
README.md CHANGED
@@ -85,24 +85,7 @@ Due to the constraints of HuggingFace, the open-source code currently experience
85
 
86
  ## 4. Evaluation Results
87
  ### Base Model
88
- #### Standard Benchmark (Models larger than 67B)
89
-
90
- <div align="center">
91
-
92
- | **Benchmark** | **Domain** | **LLaMA3 70B** | **Mixtral 8x22B** | **DeepSeek-V1 (Dense-67B)** | **DeepSeek-V2 (MoE-236B)** |
93
- |:-----------:|:--------:|:------------:|:---------------:|:-------------------------:|:------------------------:|
94
- | **MMLU** | English | 78.9 | 77.6 | 71.3 | 78.5 |
95
- | **BBH** | English | 81.0 | 78.9 | 68.7 | 78.9 |
96
- | **C-Eval** | Chinese | 67.5 | 58.6 | 66.1 | 81.7 |
97
- | **CMMLU** | Chinese | 69.3 | 60.0 | 70.8 | 84.0 |
98
- | **HumanEval** | Code | 48.2 | 53.1 | 45.1 | 48.8 |
99
- | **MBPP** | Code | 68.6 | 64.2 | 57.4 | 66.6 |
100
- | **GSM8K** | Math | 83.0 | 80.3 | 63.4 | 79.2 |
101
- | **Math** | Math | 42.2 | 42.5 | 18.7 | 43.6 |
102
-
103
- </div>
104
-
105
- #### Standard Benchmark (Models smaller than 16B)
106
  <div align="center">
107
 
108
  | **Benchmark** | **Domain** | **DeepSeek 7B (Dense)** | **DeepSeekMoE 16B** | **DeepSeek-V2-Lite (MoE-16B)** |
@@ -120,32 +103,9 @@ Due to the constraints of HuggingFace, the open-source code currently experience
120
  </div>
121
  For more evaluation details, such as few-shot settings and prompts, please check our paper.
122
 
123
- #### Context Window
124
- <p align="center">
125
- <img width="80%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/niah.png?raw=true">
126
- </p>
127
-
128
- Evaluation results on the ``Needle In A Haystack`` (NIAH) tests. DeepSeek-V2 performs well across all context window lengths up to **128K**.
129
 
130
  ### Chat Model
131
- #### Standard Benchmark (Models larger than 67B)
132
- <div align="center">
133
-
134
- | Benchmark | Domain | QWen1.5 72B Chat | Mixtral 8x22B | LLaMA3 70B Instruct | DeepSeek-V1 Chat (SFT) | DeepSeek-V2 Chat (SFT) | DeepSeek-V2 Chat (RL) |
135
- |:-----------:|:----------------:|:------------------:|:---------------:|:---------------------:|:-------------:|:-----------------------:|:----------------------:|
136
- | **MMLU** | English | 76.2 | 77.8 | 80.3 | 71.1 | 78.4 | 77.8 |
137
- | **BBH** | English | 65.9 | 78.4 | 80.1 | 71.7 | 81.3 | 79.7 |
138
- | **C-Eval** | Chinese | 82.2 | 60.0 | 67.9 | 65.2 | 80.9 | 78.0 |
139
- | **CMMLU** | Chinese | 82.9 | 61.0 | 70.7 | 67.8 | 82.4 | 81.6 |
140
- | **HumanEval** | Code | 68.9 | 75.0 | 76.2 | 73.8 | 76.8 | 81.1 |
141
- | **MBPP** | Code | 52.2 | 64.4 | 69.8 | 61.4 | 70.4 | 72.0 |
142
- | **LiveCodeBench (0901-0401)** | Code | 18.8 | 25.0 | 30.5 | 18.3 | 28.7 | 32.5 |
143
- | **GSM8K** | Math | 81.9 | 87.9 | 93.2 | 84.1 | 90.8 | 92.2 |
144
- | **Math** | Math | 40.6 | 49.8 | 48.5 | 32.6 | 52.7 | 53.9 |
145
-
146
- </div>
147
-
148
- #### Standard Benchmark (Models smaller than 16B)
149
 
150
  <div align="center">
151
 
@@ -162,12 +122,6 @@ Evaluation results on the ``Needle In A Haystack`` (NIAH) tests. DeepSeek-V2 pe
162
 
163
  </div>
164
 
165
- #### English Open Ended Generation Evaluation
166
- We evaluate our model on AlpacaEval 2.0 and MTBench, showing the competitive performance of DeepSeek-V2-Chat-RL on English conversation generation.
167
- <p align="center">
168
- <img width="50%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/mtbench.png?raw=true" />
169
- </p>
170
-
171
  #### Chinese Open Ended Generation Evaluation
172
  **Alignbench** (https://arxiv.org/abs/2311.18743)
173
  <div align="center">
@@ -185,17 +139,10 @@ We evaluate our model on AlpacaEval 2.0 and MTBench, showing the competitive per
185
  | DeepSeek-67B-Chat | 开源 | 6.43 | 5.75 | 7.11 |
186
  | Yi-34B-Chat (零一万物) | 开源 | 6.12 | 4.86 | 7.38 |
187
  | gpt-3.5-turbo-0613 | 闭源 | 6.08 | 5.35 | 6.71 |
188
- | DeepSeek-V2-Lite 16B Chat | 开源 | 6.01 | 4.71 | 7.32 |
189
 
190
  </div>
191
 
192
- #### Coding Benchmarks
193
- We evaluate our model on LiveCodeBench (0901-0401), a benchmark designed for live coding challenges. As illustrated, DeepSeek-V2 demonstrates considerable proficiency in LiveCodeBench, achieving a Pass@1 score that surpasses several other sophisticated models. This performance highlights the model's effectiveness in tackling live coding tasks.
194
-
195
- <p align="center">
196
- <img width="50%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/code_benchmarks.png?raw=true">
197
- </p>
198
-
199
  ## 5. Model Architecture
200
  DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference:
201
  - For attention, we design MLA (Multi-head Latent Attention), which utilizes low-rank key-value union compression to eliminate the bottleneck of inference-time key-value cache, thus supporting efficient inference.
@@ -204,19 +151,8 @@ DeepSeek-V2 adopts innovative architectures to guarantee economical training and
204
  <p align="center">
205
  <img width="90%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/architecture.png?raw=true" />
206
  </p>
207
- ## 6. Chat Website
208
- You can chat with the DeepSeek-V2 on DeepSeek's official website: [chat.deepseek.com](https://chat.deepseek.com/sign_in)
209
-
210
- ## 7. API Platform
211
- We also provide OpenAI-Compatible API at DeepSeek Platform: [platform.deepseek.com](https://platform.deepseek.com/). Sign up for over millions of free tokens. And you can also pay-as-you-go at an unbeatable price.
212
-
213
-
214
- <p align="center">
215
- <img width="40%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/model_price.png?raw=true">
216
- </p>
217
 
218
- ## 8. How to run locally
219
- **To utilize DeepSeek-V2 in BF16 format for inference, 80GB*8 GPUs are required.**
220
 
221
  **To utilize DeepSeek-V2-Lite in BF16 format for inference, 40GB*1 GPU is required.**
222
  ### Inference with Huggingface's Transformers
@@ -326,10 +262,10 @@ llm = ChatOpenAI(
326
  temperature=0.85,
327
  max_tokens=8000)
328
  ```
329
- ## 9. License
330
  This code repository is licensed under [the MIT License](LICENSE-CODE). The use of DeepSeek-V2 Base/Chat models is subject to [the Model License](LICENSE-MODEL). DeepSeek-V2 series (including Base and Chat) supports commercial use.
331
 
332
- ## 10. Citation
333
  ```
334
  @misc{deepseekv2,
335
  title={DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model},
@@ -341,5 +277,5 @@ This code repository is licensed under [the MIT License](LICENSE-CODE). The use
341
  }
342
  ```
343
 
344
- ## 11. Contact
345
  If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).
 
85
 
86
  ## 4. Evaluation Results
87
  ### Base Model
88
+ #### Standard Benchmark
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89
  <div align="center">
90
 
91
  | **Benchmark** | **Domain** | **DeepSeek 7B (Dense)** | **DeepSeekMoE 16B** | **DeepSeek-V2-Lite (MoE-16B)** |
 
103
  </div>
104
  For more evaluation details, such as few-shot settings and prompts, please check our paper.
105
 
 
 
 
 
 
 
106
 
107
  ### Chat Model
108
+ #### Standard Benchmark
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
109
 
110
  <div align="center">
111
 
 
122
 
123
  </div>
124
 
 
 
 
 
 
 
125
  #### Chinese Open Ended Generation Evaluation
126
  **Alignbench** (https://arxiv.org/abs/2311.18743)
127
  <div align="center">
 
139
  | DeepSeek-67B-Chat | 开源 | 6.43 | 5.75 | 7.11 |
140
  | Yi-34B-Chat (零一万物) | 开源 | 6.12 | 4.86 | 7.38 |
141
  | gpt-3.5-turbo-0613 | 闭源 | 6.08 | 5.35 | 6.71 |
142
+ | DeepSeek-V2-Lite 16B Chat (SFT) | 开源 | 6.01 | 4.71 | 7.32 |
143
 
144
  </div>
145
 
 
 
 
 
 
 
 
146
  ## 5. Model Architecture
147
  DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference:
148
  - For attention, we design MLA (Multi-head Latent Attention), which utilizes low-rank key-value union compression to eliminate the bottleneck of inference-time key-value cache, thus supporting efficient inference.
 
151
  <p align="center">
152
  <img width="90%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/architecture.png?raw=true" />
153
  </p>
 
 
 
 
 
 
 
 
 
 
154
 
155
+ ## 6. How to run locally
 
156
 
157
  **To utilize DeepSeek-V2-Lite in BF16 format for inference, 40GB*1 GPU is required.**
158
  ### Inference with Huggingface's Transformers
 
262
  temperature=0.85,
263
  max_tokens=8000)
264
  ```
265
+ ## 7. License
266
  This code repository is licensed under [the MIT License](LICENSE-CODE). The use of DeepSeek-V2 Base/Chat models is subject to [the Model License](LICENSE-MODEL). DeepSeek-V2 series (including Base and Chat) supports commercial use.
267
 
268
+ ## 8. Citation
269
  ```
270
  @misc{deepseekv2,
271
  title={DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model},
 
277
  }
278
  ```
279
 
280
+ ## 9. Contact
281
  If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).