Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 160
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14 • 295
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8, 2024 • 161
Wavelets Are All You Need for Autoregressive Image Generation Paper • 2406.19997 • Published Jun 28, 2024 • 32
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Paper • 2405.21060 • Published May 31, 2024 • 68