MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Paper • 2402.14905 • Published Feb 22 • 126
Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement Paper • 2402.14160 • Published Feb 21 • 1