view article Article Universal Assisted Generation: Faster Decoding with Any Assistant Model By danielkorat and 7 others • Oct 29, 2024 • 55
view article Article Assisted Generation: a new direction toward low-latency text generation By joaogante • May 11, 2023 • 64
view article Article Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth By mlabonne • Jul 29, 2024 • 329
view article Article Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval By aamirshakir and 2 others • Mar 22, 2024 • 92