Raincleared commited on
Commit
65b96b0
·
verified ·
1 Parent(s): f2cd96e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -126,7 +126,7 @@ The acceleration effects of LLMs with different sparsity are displayed as follow
126
  | **ProSparse-13B**\* | 87.97 | 91.02 | 77.93 | **8.67** | **4.52** | 55.29 | 2.38 | 67.50 | 1.68 |
127
  | **ProSparse-13B** | **88.80** | **91.11** | **78.28** | - | - | **53.78** | **2.44** | **66.73** | **1.70** |
128
 
129
- **Notes**: For "Dense" settings, the "Inference Speed" is obtained by [llama.cpp](https://github.com/ggerganov/llama.cpp), and the time for steps (2) and (3) is measured without sparse GPU operators. For other sparse settings, the "Inference Speed" is obtained by [PowerInfer](https://arxiv.org/pdf/2312.12456.pdf), and sparse GPU operators are applied. ProSparse settings with activation threshold shifting and the MiniCPM architecture are not supported by PowerInfer at present.
130
 
131
  ### Citation
132
 
 
126
  | **ProSparse-13B**\* | 87.97 | 91.02 | 77.93 | **8.67** | **4.52** | 55.29 | 2.38 | 67.50 | 1.68 |
127
  | **ProSparse-13B** | **88.80** | **91.11** | **78.28** | - | - | **53.78** | **2.44** | **66.73** | **1.70** |
128
 
129
+ **Notes**: For "Dense" settings, the "Inference Speed" (token/sec) is obtained by [llama.cpp](https://github.com/ggerganov/llama.cpp), and the time (us) for steps (2) and (3) is measured without sparse GPU operators. For other sparse settings, the "Inference Speed" is obtained by [PowerInfer](https://arxiv.org/pdf/2312.12456.pdf), and sparse GPU operators are applied. ProSparse settings with activation threshold shifting and the MiniCPM architecture are not supported by PowerInfer at present.
130
 
131
  ### Citation
132