Update README.md
Browse files
README.md
CHANGED
@@ -52,16 +52,14 @@
|
|
52 |
# DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
|
53 |
|
54 |
## 1. Introduction
|
55 |
-
Today, we’re introducing DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token. Compared with DeepSeek 67B, DeepSeek-V2 achieves stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times.
|
56 |
|
57 |
-
|
58 |
-
<div style="display: flex; justify-content: center;">
|
59 |
-
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/activationparameters.png?raw=true" style="height:300px; width:auto; margin-right:10px">
|
60 |
-
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/trainingcost.png?raw=true" style="height:300px; width:auto; margin-left:10px">
|
61 |
-
</div>
|
62 |
-
</p>
|
63 |
|
64 |
-
|
|
|
|
|
|
|
|
|
65 |
|
66 |
## 2. News
|
67 |
|
@@ -70,6 +68,8 @@ We pretrained DeepSeek-V2 on a diverse and high-quality corpus comprising 8.1 tr
|
|
70 |
|
71 |
## 3. Model Downloads
|
72 |
|
|
|
|
|
73 |
<div align="center">
|
74 |
|
75 |
| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** |
|
|
|
52 |
# DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
|
53 |
|
54 |
## 1. Introduction
|
|
|
55 |
|
56 |
+
Last week, the release and buzz around DeepSeek-V2 have ignited widespread interest in MLA (Multi-head Latent Attention)! Many in the community suggested open-sourcing a smaller MoE model for in-depth research. And now DeepSeek-V2-Lite comes out:
|
|
|
|
|
|
|
|
|
|
|
57 |
|
58 |
+
- 16B total params, 2.4B active params, 5.7T training tokens
|
59 |
+
- Outperforms 7B dense and 16B MoE on many benchmarks
|
60 |
+
- Deployable on single 40G GPU, fine-tunable on 8x80G GPUs
|
61 |
+
|
62 |
+
DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation.
|
63 |
|
64 |
## 2. News
|
65 |
|
|
|
68 |
|
69 |
## 3. Model Downloads
|
70 |
|
71 |
+
With DeepSeek-V2, we are open-sourcing base and chat models across two sizes:
|
72 |
+
|
73 |
<div align="center">
|
74 |
|
75 |
| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** |
|