Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings Paper • 2202.06671 • Published Feb 14, 2022 • 2
Efficient Long-context Language Model Training by Core Attention Disaggregation Paper • 2510.18121 • Published Oct 20, 2025 • 124
QuickSilver -- Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization Paper • 2506.22396 • Published Jun 27, 2025
ObfusQAte: A Proposed Framework to Evaluate LLM Robustness on Obfuscated Factual Question Answering Paper • 2508.07321 • Published Aug 10, 2025
Revisiting Bi-Linear State Transitions in Recurrent Neural Networks Paper • 2505.21749 • Published May 27, 2025 • 4
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving Paper • 2401.09670 • Published Jan 18, 2024 • 2
Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations Paper • 2409.17264 • Published Sep 25, 2024