Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 11 days ago • 134
view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control 23 days ago • 109
From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control Paper • 2405.04798 • Published May 8, 2024 • 1
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction Paper • 2502.07316 • Published 16 days ago • 45
view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference Jan 16 • 69
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 22 days ago • 195
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper • 2501.16975 • Published 29 days ago • 26
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Paper • 2501.18512 • Published 27 days ago • 27
view article Article Mastering Long Contexts in LLMs with KVPress By nvidia and 1 other • Jan 23 • 63
view article Article Fine-tune ModernBERT for RAG with Synthetic Data By sdiazlor and 2 others • Jan 20 • 36
Hype, Sustainability, and the Price of the Bigger-is-Better Paradigm in AI Paper • 2409.14160 • Published Sep 21, 2024 • 2