deepseek-ai
/

DeepSeek-V2-Lite-Chat

@@ -85,24 +85,7 @@ Due to the constraints of HuggingFace, the open-source code currently experience
 ## 4. Evaluation Results
 ### Base Model
-#### Standard Benchmark (Models larger than 67B)
-<div align="center">
-| **Benchmark** | **Domain** | **LLaMA3 70B** | **Mixtral 8x22B** | **DeepSeek-V1 (Dense-67B)** | **DeepSeek-V2 (MoE-236B)** |
-|:-----------:|:--------:|:------------:|:---------------:|:-------------------------:|:------------------------:|
-| **MMLU** | English | 78.9 | 77.6 | 71.3 | 78.5 |
-| **BBH** | English | 81.0 | 78.9 | 68.7 | 78.9 |
-| **C-Eval** | Chinese | 67.5 | 58.6 | 66.1 | 81.7 |
-| **CMMLU** | Chinese | 69.3 | 60.0 | 70.8 | 84.0 |
-| **HumanEval** | Code | 48.2	| 53.1 | 45.1 | 48.8 |
-| **MBPP** | Code | 68.6 | 64.2 | 57.4 | 66.6 |
-| **GSM8K** | Math | 83.0 | 80.3 | 63.4 | 79.2 |
-| **Math** | Math | 42.2 | 42.5 | 18.7 | 43.6 |
-</div>
-#### Standard Benchmark (Models smaller than 16B)
 <div align="center">
 | **Benchmark** | **Domain** | **DeepSeek 7B (Dense)** | **DeepSeekMoE 16B** | **DeepSeek-V2-Lite (MoE-16B)** |
@@ -120,32 +103,9 @@ Due to the constraints of HuggingFace, the open-source code currently experience
 </div>
 For more evaluation details, such as few-shot settings and prompts, please check our paper.
-#### Context Window
-<p align="center">
-  <img width="80%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/niah.png?raw=true">
-</p>
-Evaluation results on the ``Needle In A Haystack`` (NIAH) tests.  DeepSeek-V2 performs well across all context window lengths up to **128K**.
 ### Chat Model
-#### Standard Benchmark (Models larger than 67B)
-<div align="center">
-| Benchmark | Domain         | QWen1.5 72B Chat | Mixtral 8x22B | LLaMA3 70B Instruct | DeepSeek-V1 Chat (SFT) | DeepSeek-V2 Chat (SFT) | DeepSeek-V2 Chat (RL) |
-|:-----------:|:----------------:|:------------------:|:---------------:|:---------------------:|:-------------:|:-----------------------:|:----------------------:|
-| **MMLU**      | English        | 76.2             | 77.8          | 80.3                | 71.1        | 78.4                 | 77.8                 |
-| **BBH**       | English        | 65.9             | 78.4          | 80.1                | 71.7        | 81.3                 | 79.7                 |
-| **C-Eval**    | Chinese        | 82.2             | 60.0          | 67.9                | 65.2        | 80.9                 | 78.0                 |
-| **CMMLU**     | Chinese        | 82.9             | 61.0          | 70.7                | 67.8        | 82.4                 | 81.6                 |
-| **HumanEval** | Code           | 68.9             | 75.0          | 76.2                | 73.8        | 76.8                 | 81.1                 |
-| **MBPP**      | Code           | 52.2             | 64.4          | 69.8                | 61.4        | 70.4                 | 72.0                 |
-|   **LiveCodeBench  (0901-0401)**     | Code           | 18.8             | 25.0          | 30.5                | 18.3        | 28.7                 | 32.5                 |
-| **GSM8K**     | Math           | 81.9             | 87.9          | 93.2                | 84.1        | 90.8                 | 92.2                 |
-| **Math**      | Math           | 40.6             | 49.8          | 48.5                | 32.6        | 52.7                 | 53.9                 |
-</div>
-#### Standard Benchmark (Models smaller than 16B)
 <div align="center">
@@ -162,12 +122,6 @@ Evaluation results on the ``Needle In A Haystack`` (NIAH) tests.  DeepSeek-V2 pe
 </div>
-#### English Open Ended Generation Evaluation
-We evaluate our model on AlpacaEval 2.0 and MTBench, showing the competitive performance of DeepSeek-V2-Chat-RL on English conversation generation.
-<p align="center">
-  <img width="50%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/mtbench.png?raw=true" />
-</p>
 #### Chinese Open Ended Generation Evaluation
 **Alignbench** (https://arxiv.org/abs/2311.18743)
 <div align="center">
@@ -185,17 +139,10 @@ We evaluate our model on AlpacaEval 2.0 and MTBench, showing the competitive per
 | DeepSeek-67B-Chat | 开源 | 6.43 | 5.75 | 7.11 |
 | Yi-34B-Chat (零一万物) | 开源 | 6.12 | 4.86 | 7.38 |
 | gpt-3.5-turbo-0613 | 闭源 | 6.08 | 5.35 | 6.71 |
-| DeepSeek-V2-Lite 16B Chat | 开源 | 6.01 | 4.71 | 7.32 |
 </div>
-#### Coding Benchmarks
-We evaluate our model on LiveCodeBench (0901-0401), a benchmark designed for live coding challenges. As illustrated, DeepSeek-V2 demonstrates considerable proficiency in LiveCodeBench, achieving a Pass@1 score that surpasses several other sophisticated models. This performance highlights the model's effectiveness in tackling live coding tasks.
-<p align="center">
-  <img width="50%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/code_benchmarks.png?raw=true">
-</p>
 ## 5. Model Architecture
 DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference：
 - For attention, we design MLA (Multi-head Latent Attention), which utilizes low-rank key-value union compression to eliminate the bottleneck of inference-time key-value cache, thus supporting efficient inference.
@@ -204,19 +151,8 @@ DeepSeek-V2 adopts innovative architectures to guarantee economical training and
 <p align="center">
   <img width="90%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/architecture.png?raw=true" />
 </p>
-## 6. Chat Website
-You can chat with the DeepSeek-V2 on DeepSeek's official website: [chat.deepseek.com](https://chat.deepseek.com/sign_in)
-## 7. API Platform
-We also provide OpenAI-Compatible API at DeepSeek Platform: [platform.deepseek.com](https://platform.deepseek.com/). Sign up for over millions of free tokens. And you can also pay-as-you-go at an unbeatable price.
-<p align="center">
-  <img width="40%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/model_price.png?raw=true">
-</p>
-## 8. How to run locally
-**To utilize DeepSeek-V2 in BF16 format for inference, 80GB*8 GPUs are required.**
 **To utilize DeepSeek-V2-Lite in BF16 format for inference, 40GB*1 GPU is required.**
 ### Inference with Huggingface's Transformers
@@ -326,10 +262,10 @@ llm = ChatOpenAI(
     temperature=0.85,
     max_tokens=8000)
 ```
-## 9. License
 This code repository is licensed under [the MIT License](LICENSE-CODE). The use of DeepSeek-V2 Base/Chat models is subject to [the Model License](LICENSE-MODEL). DeepSeek-V2 series (including Base and Chat) supports commercial use.
-## 10. Citation
 ```
 @misc{deepseekv2,
       title={DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model},
@@ -341,5 +277,5 @@ This code repository is licensed under [the MIT License](LICENSE-CODE). The use
 }
 ```
-## 11. Contact
 If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).

 ## 4. Evaluation Results
 ### Base Model
+#### Standard Benchmark
 <div align="center">
 | **Benchmark** | **Domain** | **DeepSeek 7B (Dense)** | **DeepSeekMoE 16B** | **DeepSeek-V2-Lite (MoE-16B)** |
 </div>
 For more evaluation details, such as few-shot settings and prompts, please check our paper.
 ### Chat Model
+#### Standard Benchmark
 <div align="center">
 </div>
 #### Chinese Open Ended Generation Evaluation
 **Alignbench** (https://arxiv.org/abs/2311.18743)
 <div align="center">
 | DeepSeek-67B-Chat | 开源 | 6.43 | 5.75 | 7.11 |
 | Yi-34B-Chat (零一万物) | 开源 | 6.12 | 4.86 | 7.38 |
 | gpt-3.5-turbo-0613 | 闭源 | 6.08 | 5.35 | 6.71 |
+| DeepSeek-V2-Lite 16B Chat (SFT) | 开源 | 6.01 | 4.71 | 7.32 |
 </div>
 ## 5. Model Architecture
 DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference：
 - For attention, we design MLA (Multi-head Latent Attention), which utilizes low-rank key-value union compression to eliminate the bottleneck of inference-time key-value cache, thus supporting efficient inference.
 <p align="center">
   <img width="90%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/architecture.png?raw=true" />
 </p>
+## 6. How to run locally
 **To utilize DeepSeek-V2-Lite in BF16 format for inference, 40GB*1 GPU is required.**
 ### Inference with Huggingface's Transformers
     temperature=0.85,
     max_tokens=8000)
 ```
+## 7. License
 This code repository is licensed under [the MIT License](LICENSE-CODE). The use of DeepSeek-V2 Base/Chat models is subject to [the Model License](LICENSE-MODEL). DeepSeek-V2 series (including Base and Chat) supports commercial use.
+## 8. Citation
 ```
 @misc{deepseekv2,
       title={DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model},
 }
 ```
+## 9. Contact
 If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).