SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification Paper • 2305.09781 • Published May 16, 2023 • 4
GNNPipe: Scaling Deep GNN Training with Pipelined Model Parallelism Paper • 2308.10087 • Published Aug 19, 2023 • 1
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding Paper • 2402.12374 • Published Feb 19, 2024 • 4
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding Paper • 2404.11912 • Published Apr 18, 2024 • 17
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices Paper • 2406.02532 • Published Jun 4, 2024 • 13
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding Paper • 2408.11049 • Published Aug 20, 2024 • 13