Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published Mar 12 • 72
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published Feb 13 • 149
view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais and 2 others • Nov 13, 2024 • 101
view article Article A failed experiment: Infini-Attention, and why we should keep trying? By neuralink and 2 others • Aug 14, 2024 • 64
view article Article Welcome FalconMamba: The first strong attention-free 7B model By JingweiZuo and 5 others • Aug 12, 2024 • 112
TransformerFAM: Feedback attention is working memory Paper • 2404.09173 • Published Apr 14, 2024 • 44
view article Article Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent By qgallouedec and 3 others • Apr 22, 2024 • 81
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks Paper • 2402.04248 • Published Feb 6, 2024 • 33
Large Language Models as Generalizable Policies for Embodied Tasks Paper • 2310.17722 • Published Oct 26, 2023 • 7
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization Paper • 2308.02151 • Published Aug 4, 2023 • 19