Raincleared
commited on
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -101,8 +101,8 @@ First, we utilize [PowerInfer](https://arxiv.org/pdf/2312.12456.pdf), a state-of
|
|
101 |
|
102 |
Moreover, considering the potential inference inaccuracies caused by wrong predictions of activation predictors, we implement two sparse GPU [operators](https://github.com/Raincleared-Song/sparse_gpu_operator) for faster accurate inference utilizing activation sparsity. They are responsible for the speedup of two key steps in a gated FFN:
|
103 |
|
104 |
-
- Step (2) `S2
|
105 |
-
- Step (3) `S3
|
106 |
|
107 |
where \\(\mathbf{s}\\), \\(\mathbf{x}\\), \\(\mathbf{x}_1\\), and \\(\odot\\) denote the gating scores, the FFN input hidden states, the intermediate outputs, and the element-wise multiplication respectively. \\(\mathbf{W}_1\\) and \\(\mathbf{W}_2\\) are FFN weight matrices.
|
108 |
|
|
|
101 |
|
102 |
Moreover, considering the potential inference inaccuracies caused by wrong predictions of activation predictors, we implement two sparse GPU [operators](https://github.com/Raincleared-Song/sparse_gpu_operator) for faster accurate inference utilizing activation sparsity. They are responsible for the speedup of two key steps in a gated FFN:
|
103 |
|
104 |
+
- Step (2) (`S2`): a fused operator of ReLU and \\(\mathbf{s} \odot (\mathbf{x} \mathbf{W}_1^T)\\);
|
105 |
+
- Step (3) (`S3`): a sparse matrix-vector multiplication operator \\(\mathbf{x}_1 \mathbf{W}_2^T\\).
|
106 |
|
107 |
where \\(\mathbf{s}\\), \\(\mathbf{x}\\), \\(\mathbf{x}_1\\), and \\(\odot\\) denote the gating scores, the FFN input hidden states, the intermediate outputs, and the element-wise multiplication respectively. \\(\mathbf{W}_1\\) and \\(\mathbf{W}_2\\) are FFN weight matrices.
|
108 |
|