nielsr HF Staff commited on
Commit
24d878d
Β·
verified Β·
1 Parent(s): da5c664

Improve model card: Add paper abstract, project page, and metadata tags

Browse files

This PR improves the model card by:
- Updating the link to the paper to point to the Hugging Face paper page.
- Including a direct link to the Hugging Face project collection page.
- Adding the full paper abstract for better context and understanding of the model's innovations.
- Adding relevant metadata tags (`llm`, `sparse-attention`, `on-device`, `end-device`) to enhance discoverability on the Hub.

Files changed (1) hide show
  1. README.md +15 -4
README.md CHANGED
@@ -1,24 +1,35 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - zh
5
  - en
6
- pipeline_tag: text-generation
7
  library_name: transformers
 
 
 
 
 
 
 
8
  ---
 
9
  <div align="center">
10
  <img src="https://github.com/OpenBMB/MiniCPM/blob/main/assets/minicpm_logo.png?raw=true" width="500em" ></img>
11
  </div>
12
 
13
  <p align="center">
14
  <a href="https://github.com/OpenBMB/MiniCPM/" target="_blank">GitHub Repo</a> |
15
- <a href="https://arxiv.org/abs/2506.07900" target="_blank">Technical Report</a> |
 
16
  <a href="https://mp.weixin.qq.com/s/KIhH2nCURBXuFXAtYRpuXg?poc_token=HBIsUWijxino8oJ5s6HcjcfXFRi0Xj2LJlxPYD9c">Join Us</a>
17
  </p>
18
  <p align="center">
19
  πŸ‘‹ Contact us in <a href="https://discord.gg/3cGQn9b3YM" target="_blank">Discord</a> and <a href="https://github.com/OpenBMB/MiniCPM/blob/main/assets/wechat.jpg" target="_blank">WeChat</a>
20
  </p>
21
 
 
 
 
 
22
  ## What's New
23
  - [2025.06.06] **MiniCPM4** series are released! This model achieves ultimate efficiency improvements while maintaining optimal performance at the same scale! It can achieve over 5x generation acceleration on typical end-side chips! You can find technical report [here](https://github.com/OpenBMB/MiniCPM/tree/main/report/MiniCPM_4_Technical_Report.pdf).πŸ”₯πŸ”₯πŸ”₯
24
 
@@ -47,7 +58,7 @@ MiniCPM 4 is an extremely efficient edge-side large model that has undergone eff
47
  - Efficient Training Engineering Optimization: Adopts FP8 low-precision computing technology combined with Multi-token Prediction training strategy
48
 
49
  - πŸ“š **High-Quality Training Data:**
50
- - UltraClean -- High-quality Pre-training Data Filtering and Generation: Builds iterative data cleaning strategies based on efficient data verification, open-sourcing high-quality Chinese and English pre-training dataset [UltraFinweb](https://huggingface.co/datasets/openbmb/Ultra-FineWeb)
51
  - UltraChat v2 -- High-quality Supervised Fine-tuning Data Generation: Constructs large-scale high-quality supervised fine-tuning datasets covering multiple dimensions including knowledge-intensive data, reasoning-intensive data, instruction-following data, long text understanding data, and tool calling data
52
 
53
  - ⚑ **Efficient Inference System:**
 
1
  ---
 
2
  language:
3
  - zh
4
  - en
 
5
  library_name: transformers
6
+ license: apache-2.0
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - llm
10
+ - sparse-attention
11
+ - on-device
12
+ - end-device
13
  ---
14
+
15
  <div align="center">
16
  <img src="https://github.com/OpenBMB/MiniCPM/blob/main/assets/minicpm_logo.png?raw=true" width="500em" ></img>
17
  </div>
18
 
19
  <p align="center">
20
  <a href="https://github.com/OpenBMB/MiniCPM/" target="_blank">GitHub Repo</a> |
21
+ <a href="https://huggingface.co/papers/2506.07900" target="_blank">Paper</a> |
22
+ <a href="https://huggingface.co/collections/openbmb/minicpm4-6841ab29d180257e940baa9b" target="_blank">Project Page</a> |
23
  <a href="https://mp.weixin.qq.com/s/KIhH2nCURBXuFXAtYRpuXg?poc_token=HBIsUWijxino8oJ5s6HcjcfXFRi0Xj2LJlxPYD9c">Join Us</a>
24
  </p>
25
  <p align="center">
26
  πŸ‘‹ Contact us in <a href="https://discord.gg/3cGQn9b3YM" target="_blank">Discord</a> and <a href="https://github.com/OpenBMB/MiniCPM/blob/main/assets/wechat.jpg" target="_blank">WeChat</a>
27
  </p>
28
 
29
+ # Paper Abstract
30
+
31
+ This paper introduces MiniCPM4, a highly efficient large language model (LLM) designed explicitly for end-side devices. We achieve this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifically, in terms of model architecture, we propose InfLLM v2, a trainable sparse attention mechanism that accelerates both prefilling and decoding phases for long-context processing. Regarding training data, we propose UltraClean, an efficient and accurate pre-training data filtering and generation strategy, and UltraChat v2, a comprehensive supervised fine-tuning dataset. These datasets enable satisfactory model performance to be achieved using just 8 trillion training tokens. Regarding training algorithms, we propose ModelTunnel v2 for efficient pre-training strategy search, and improve existing post-training methods by introducing chunk-wise rollout for load-balanced reinforcement learning and data-efficient tenary LLM, BitCPM. Regarding inference systems, we propose this http URL that integrates sparse attention, model quantization, and speculative sampling to achieve efficient prefilling and decoding. To meet diverse on-device requirements, MiniCPM4 is available in two versions, with 0.5B and 8B parameters, respectively. Furthermore, we construct a hybrid reasoning model, MiniCPM4.1, which can be used in both deep reasoning mode and non-reasoning mode. Evaluation results demonstrate that MiniCPM4 and MiniCPM4.1 outperform similar-sized open-source models across benchmarks, with the 8B variants showing significant speed improvements on long sequence understanding and generation.
32
+
33
  ## What's New
34
  - [2025.06.06] **MiniCPM4** series are released! This model achieves ultimate efficiency improvements while maintaining optimal performance at the same scale! It can achieve over 5x generation acceleration on typical end-side chips! You can find technical report [here](https://github.com/OpenBMB/MiniCPM/tree/main/report/MiniCPM_4_Technical_Report.pdf).πŸ”₯πŸ”₯πŸ”₯
35
 
 
58
  - Efficient Training Engineering Optimization: Adopts FP8 low-precision computing technology combined with Multi-token Prediction training strategy
59
 
60
  - πŸ“š **High-Quality Training Data:**
61
+ - UltraClean -- High-quality Pre-training Data Filtering and Generation: Builds iterative data cleaning strategies based on efficient data verification, open-sourcing high-quality Chinese and English pre-training dataset [UltraFineweb](https://huggingface.co/datasets/openbmb/Ultra-FineWeb)
62
  - UltraChat v2 -- High-quality Supervised Fine-tuning Data Generation: Constructs large-scale high-quality supervised fine-tuning datasets covering multiple dimensions including knowledge-intensive data, reasoning-intensive data, instruction-following data, long text understanding data, and tool calling data
63
 
64
  - ⚑ **Efficient Inference System:**