LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs Paper • 2506.14429 • Published 8 days ago • 43
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published 9 days ago • 234
CoLLiE: Collaborative Training of Large Language Models in an Efficient Way Paper • 2312.00407 • Published Dec 1, 2023 • 3
DetectiveQA: Evaluating Long-Context Reasoning on Detective Novels Paper • 2409.02465 • Published Sep 4, 2024 • 1
LongWanjuan: Towards Systematic Measurement for Long Text Quality Paper • 2402.13583 • Published Feb 21, 2024 • 1
Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache Paper • 2506.11886 • Published 12 days ago • 20
ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning Paper • 2505.15776 • Published May 21 • 10
VideoRoPE: What Makes for Good Video Rotary Position Embeddi Collection A storage repo for VideoRoPE. • 6 items • Updated 8 days ago • 3
Farewell to Length Extrapolation, a Training-Free Infinite Context with Finite Attention Scope Paper • 2407.15176 • Published Jul 21, 2024 • 3
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting Paper • 2503.00784 • Published Mar 2 • 13
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs Paper • 2502.14837 • Published Feb 20 • 4
World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning Paper • 2503.10480 • Published Mar 13 • 54
VideoRoPE: What Makes for Good Video Rotary Position Embedding? Paper • 2502.05173 • Published Feb 7 • 65