SparseLLM
/

ProSparse-MiniCPM-1B-sft

Text Generation

Model card Files Files and versions Community

Raincleared commited on May 28, 2024

Commit

65b96b0

·

verified ·

1 Parent(s): f2cd96e

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -126,7 +126,7 @@ The acceleration effects of LLMs with different sparsity are displayed as follow
 | **ProSparse-13B**\* |        87.97        |        91.02         |         77.93         |        **8.67**         | **4.52** |      55.29       |        2.38         |       67.50       |         1.68         |
 |   **ProSparse-13B**   |        **88.80**        |        **91.11**         |         **78.28**         |          -          | - |      **53.78**       |        **2.44**         |       **66.73**       |         **1.70**         |
-**Notes**: For "Dense" settings, the "Inference Speed" is obtained by [llama.cpp](https://github.com/ggerganov/llama.cpp), and the time for steps (2) and (3) is measured without sparse GPU operators. For other sparse settings, the "Inference Speed" is obtained by [PowerInfer](https://arxiv.org/pdf/2312.12456.pdf), and sparse GPU operators are applied. ProSparse settings with activation threshold shifting and the MiniCPM architecture are not supported by PowerInfer at present.
 ### Citation

 | **ProSparse-13B**\* |        87.97        |        91.02         |         77.93         |        **8.67**         | **4.52** |      55.29       |        2.38         |       67.50       |         1.68         |
 |   **ProSparse-13B**   |        **88.80**        |        **91.11**         |         **78.28**         |          -          | - |      **53.78**       |        **2.44**         |       **66.73**       |         **1.70**         |
+**Notes**: For "Dense" settings, the "Inference Speed" (token/sec) is obtained by [llama.cpp](https://github.com/ggerganov/llama.cpp), and the time (us) for steps (2) and (3) is measured without sparse GPU operators. For other sparse settings, the "Inference Speed" is obtained by [PowerInfer](https://arxiv.org/pdf/2312.12456.pdf), and sparse GPU operators are applied. ProSparse settings with activation threshold shifting and the MiniCPM architecture are not supported by PowerInfer at present.
 ### Citation