Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads Paper • 2401.10774 • Published Jan 19, 2024 • 56
BitDelta: Your Fine-Tune May Only Be Worth One Bit Paper • 2402.10193 • Published Feb 15, 2024 • 23 • 5