efficient llm Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache Paper • 2401.02669 • Published Jan 5, 2024 • 16
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache Paper • 2401.02669 • Published Jan 5, 2024 • 16
model architecture Mixtral of Experts Paper • 2401.04088 • Published Jan 8, 2024 • 160 MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper • 2401.04081 • Published Jan 8, 2024 • 72 TinyLlama: An Open-Source Small Language Model Paper • 2401.02385 • Published Jan 4, 2024 • 95 LLaMA Pro: Progressive LLaMA with Block Expansion Paper • 2401.02415 • Published Jan 4, 2024 • 54
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper • 2401.04081 • Published Jan 8, 2024 • 72