-
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper • 2412.11768 • Published • 44 -
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Paper • 2412.14161 • Published • 52 -
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments
Paper • 2408.10945 • Published • 11 -
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 54
Collections
Discover the best community collections!
Collections including paper arxiv:2404.07143
-
STaR: Bootstrapping Reasoning With Reasoning
Paper • 2203.14465 • Published • 8 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 55 -
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 21 -
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 33
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 143 -
Elucidating the Design Space of Diffusion-Based Generative Models
Paper • 2206.00364 • Published • 16 -
GLU Variants Improve Transformer
Paper • 2002.05202 • Published • 3 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 144
-
LLoCO: Learning Long Contexts Offline
Paper • 2404.07979 • Published • 23 -
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 116 -
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
Paper • 2402.11550 • Published • 18 -
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper • 2401.18058 • Published • 23
-
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 68 -
TransformerFAM: Feedback attention is working memory
Paper • 2404.09173 • Published • 44 -
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Paper • 2404.07143 • Published • 110 -
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Paper • 2406.02657 • Published • 41
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 615 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 101 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 106 -
TransformerFAM: Feedback attention is working memory
Paper • 2404.09173 • Published • 44