Collections
Discover the best community collections!
Collections including paper arxiv:2402.17764
-
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 18 -
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 78 -
Repeat After Me: Transformers are Better than State Space Models at Copying
Paper • 2402.01032 • Published • 22 -
Zoology: Measuring and Improving Recall in Efficient Language Models
Paper • 2312.04927 • Published • 2
-
Transformers are Multi-State RNNs
Paper • 2401.06104 • Published • 35 -
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 78 -
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
Paper • 2402.10790 • Published • 40 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 602
-
Rethinking Optimization and Architecture for Tiny Language Models
Paper • 2402.02791 • Published • 12 -
More Agents Is All You Need
Paper • 2402.05120 • Published • 51 -
Scaling Laws for Forgetting When Fine-Tuning Large Language Models
Paper • 2401.05605 • Published -
Aligning Large Language Models with Counterfactual DPO
Paper • 2401.09566 • Published • 2
-
Scaling Laws for Downstream Task Performance of Large Language Models
Paper • 2402.04177 • Published • 17 -
Offline Actor-Critic Reinforcement Learning Scales to Large Models
Paper • 2402.05546 • Published • 4 -
SaulLM-7B: A pioneering Large Language Model for Law
Paper • 2403.03883 • Published • 75 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 602
-
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 16 -
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 11 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 69 -
Specialized Language Models with Cheap Inference from Limited Domain Data
Paper • 2402.01093 • Published • 45
-
BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation
Paper • 2401.17053 • Published • 30 -
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Paper • 2402.04248 • Published • 30 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 69 -
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Paper • 2402.05930 • Published • 39
-
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper • 2401.15024 • Published • 68 -
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
Paper • 2401.14112 • Published • 17 -
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
Paper • 2401.13919 • Published • 25 -
Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation
Paper • 2401.14257 • Published • 9
-
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 52 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 18 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 19 -
Linear Transformers are Versatile In-Context Learners
Paper • 2402.14180 • Published • 6