S-LoRA: Serving Thousands of Concurrent LoRA Adapters Paper • 2311.03285 • Published Nov 6, 2023 • 32
Towards Modular LLMs by Building and Reusing a Library of LoRAs Paper • 2405.11157 • Published May 18, 2024 • 31
Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning Paper • 2407.15762 • Published Jul 22, 2024 • 10
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization Paper • 2401.18079 • Published Jan 31, 2024 • 7 • 2