Update README.md
Browse files
README.md
CHANGED
@@ -126,7 +126,7 @@ The acceleration effects of LLMs with different sparsity are displayed as follow
|
|
126 |
| **ProSparse-13B**\* | 87.97 | 91.02 | 77.93 | **8.67** | **4.52** | 55.29 | 2.38 | 67.50 | 1.68 |
|
127 |
| **ProSparse-13B** | **88.80** | **91.11** | **78.28** | - | - | **53.78** | **2.44** | **66.73** | **1.70** |
|
128 |
|
129 |
-
**Notes**: For "Dense" settings, the "Inference Speed" is obtained by [llama.cpp](https://github.com/ggerganov/llama.cpp), and the time for steps (2) and (3) is measured without sparse GPU operators. For other sparse settings, the "Inference Speed" is obtained by [PowerInfer](https://arxiv.org/pdf/2312.12456.pdf), and sparse GPU operators are applied. ProSparse settings with activation threshold shifting and the MiniCPM architecture are not supported by PowerInfer at present.
|
130 |
|
131 |
### Citation
|
132 |
|
|
|
126 |
| **ProSparse-13B**\* | 87.97 | 91.02 | 77.93 | **8.67** | **4.52** | 55.29 | 2.38 | 67.50 | 1.68 |
|
127 |
| **ProSparse-13B** | **88.80** | **91.11** | **78.28** | - | - | **53.78** | **2.44** | **66.73** | **1.70** |
|
128 |
|
129 |
+
**Notes**: For "Dense" settings, the "Inference Speed" (token/sec) is obtained by [llama.cpp](https://github.com/ggerganov/llama.cpp), and the time (us) for steps (2) and (3) is measured without sparse GPU operators. For other sparse settings, the "Inference Speed" is obtained by [PowerInfer](https://arxiv.org/pdf/2312.12456.pdf), and sparse GPU operators are applied. ProSparse settings with activation threshold shifting and the MiniCPM architecture are not supported by PowerInfer at present.
|
130 |
|
131 |
### Citation
|
132 |
|