Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction Paper • 2505.11254 • Published May 16 • 48
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published Feb 13 • 149
HiP Attention: Sparse Sub-Quadratic Attention with Hierarchical Attention Pruning Paper • 2406.09827 • Published Jun 14, 2024 • 2