luofuli commited on
Commit
4621eeb
1 Parent(s): 89a2dbc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -55,8 +55,8 @@
55
 
56
  Last week, the release and buzz around DeepSeek-V2 have ignited widespread interest in MLA (Multi-head Latent Attention)! Many in the community suggested open-sourcing a smaller MoE model for in-depth research. And now DeepSeek-V2-Lite comes out:
57
 
58
- - 16B total params, 2.4B active params, 5.7T training tokens
59
- - Outperforms 7B dense and 16B MoE on many benchmarks
60
  - Deployable on single 40G GPU, fine-tunable on 8x80G GPUs
61
 
62
  DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation.
 
55
 
56
  Last week, the release and buzz around DeepSeek-V2 have ignited widespread interest in MLA (Multi-head Latent Attention)! Many in the community suggested open-sourcing a smaller MoE model for in-depth research. And now DeepSeek-V2-Lite comes out:
57
 
58
+ - 16B total params, 2.4B active params, scratch training with 5.7T tokens
59
+ - Outperforms 7B dense and 16B MoE on many English & Chinese benchmarks
60
  - Deployable on single 40G GPU, fine-tunable on 8x80G GPUs
61
 
62
  DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation.