A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond Paper • 2503.21614 • Published 6 days ago • 32
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs Paper • 2502.12085 • Published Feb 17 • 4
OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of Pre-trained Models Paper • 2307.03084 • Published Jul 5, 2023 • 1
OpenPrompt: An Open-source Framework for Prompt-learning Paper • 2111.01998 • Published Nov 3, 2021 • 1
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting Paper • 2402.13720 • Published Feb 21, 2024 • 7
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting Paper • 2402.13720 • Published Feb 21, 2024 • 7
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies Paper • 2404.06395 • Published Apr 9, 2024 • 22
Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads Paper • 2410.01805 • Published Oct 2, 2024
FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling Paper • 2502.14856 • Published Feb 20 • 7
FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling Paper • 2502.14856 • Published Feb 20 • 7
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs Paper • 2502.12085 • Published Feb 17 • 4
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs Paper • 2502.12085 • Published Feb 17 • 4
Fusion-in-T5: Unifying Document Ranking Signals for Improved Information Retrieval Paper • 2305.14685 • Published May 24, 2023 • 1
Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In Paper • 2305.17331 • Published May 27, 2023 • 1
Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression Paper • 2402.16058 • Published Feb 25, 2024
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework Paper • 2408.01262 • Published Aug 2, 2024 • 1