SparseLLM
/

prosparse-llama-2-7b

@@ -101,8 +101,8 @@ First, we utilize [PowerInfer](https://arxiv.org/pdf/2312.12456.pdf), a state-of
 Moreover, considering the potential inference inaccuracies caused by wrong predictions of activation predictors, we implement two sparse GPU [operators](https://github.com/Raincleared-Song/sparse_gpu_operator) for faster accurate inference utilizing activation sparsity. They are responsible for the speedup of two key steps in a gated FFN:
-- Step (2) `S2`: a fused operator of ReLU and \\(\mathbf{s} \odot (\mathbf{x} \mathbf{W}_1^T)\\);
-- Step (3) `S3`: a sparse matrix-vector multiplication operator \\(\mathbf{x}_1 \mathbf{W}_2^T\\).
 where \\(\mathbf{s}\\), \\(\mathbf{x}\\), \\(\mathbf{x}_1\\), and \\(\odot\\) denote the gating scores, the FFN input hidden states, the intermediate outputs, and the element-wise multiplication respectively. \\(\mathbf{W}_1\\) and \\(\mathbf{W}_2\\) are FFN weight matrices.

 Moreover, considering the potential inference inaccuracies caused by wrong predictions of activation predictors, we implement two sparse GPU [operators](https://github.com/Raincleared-Song/sparse_gpu_operator) for faster accurate inference utilizing activation sparsity. They are responsible for the speedup of two key steps in a gated FFN:
+- Step (2) (`S2`): a fused operator of ReLU and \\(\mathbf{s} \odot (\mathbf{x} \mathbf{W}_1^T)\\);
+- Step (3) (`S3`): a sparse matrix-vector multiplication operator \\(\mathbf{x}_1 \mathbf{W}_2^T\\).
 where \\(\mathbf{s}\\), \\(\mathbf{x}\\), \\(\mathbf{x}_1\\), and \\(\odot\\) denote the gating scores, the FFN input hidden states, the intermediate outputs, and the element-wise multiplication respectively. \\(\mathbf{W}_1\\) and \\(\mathbf{W}_2\\) are FFN weight matrices.