MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14 • 295
FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation Paper • 2502.01068 • Published Feb 3 • 17
Taming the Titans: A Survey of Efficient LLM Inference Serving Paper • 2504.19720 • Published Apr 28 • 11