deepseek-ai
/

DeepSeek-V2-Lite-Chat

Text Generation

text-generation-inference

Model card Files Files and versions Community

luofuli commited on May 17, 2024

Commit

89a2dbc

·

verified ·

1 Parent(s): 1e90cb3

Update README.md

Files changed (1) hide show

README.md +8 -8

README.md CHANGED Viewed

@@ -52,16 +52,14 @@
 # DeepSeek-V2:  A Strong, Economical, and Efficient Mixture-of-Experts Language Model
 ## 1. Introduction
-Today, we’re introducing DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token. Compared with DeepSeek 67B, DeepSeek-V2 achieves stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times.
-<p align="center">
-<div style="display: flex; justify-content: center;">
-    <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/activationparameters.png?raw=true" style="height:300px; width:auto; margin-right:10px">
-    <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/trainingcost.png?raw=true" style="height:300px; width:auto; margin-left:10px">
-</div>
-</p>
-We pretrained DeepSeek-V2 on a diverse and high-quality corpus comprising 8.1 trillion tokens. This comprehensive pretraining was followed by a process of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unleash the model's capabilities. The evaluation results validate the effectiveness of our approach as DeepSeek-V2 achieves remarkable performance on both standard benchmarks and open-ended generation evaluation.
 ## 2. News
@@ -70,6 +68,8 @@ We pretrained DeepSeek-V2 on a diverse and high-quality corpus comprising 8.1 tr
 ## 3. Model Downloads
 <div align="center">
 | **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** |

 # DeepSeek-V2:  A Strong, Economical, and Efficient Mixture-of-Experts Language Model
 ## 1. Introduction
+Last week, the release and buzz around DeepSeek-V2 have ignited widespread interest in MLA (Multi-head Latent Attention)! Many in the community suggested open-sourcing a smaller MoE model for in-depth research. And now DeepSeek-V2-Lite comes out:
+- 16B total params, 2.4B active params, 5.7T training tokens
+- Outperforms 7B dense and 16B MoE on many benchmarks
+- Deployable on single 40G GPU, fine-tunable on 8x80G GPUs
+DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation.
 ## 2. News
 ## 3. Model Downloads
+With DeepSeek-V2, we are open-sourcing base and chat models across two sizes:
 <div align="center">
 | **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** |