Metis: Training Large Language Models with Advanced Low-Bit Quantization Paper • 2509.00404 • Published 9 days ago • 5
Jamba 1.7 Collection The AI21 Jamba family of models are hybrid SSM-Transformer foundation models, blending speed, efficient long context processing, and accuracy. • 4 items • Updated Jul 2 • 12
BitVLA Collection 1-bit Vision-Language-Action Models for Robotics Manipulation • 9 items • Updated Jun 30 • 3
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs Paper • 2504.18415 • Published Apr 25 • 47
BitNet Collection 🔥BitNet family of large language models (1-bit LLMs). • 7 items • Updated May 1 • 51
TransMamba: Flexibly Switching between Transformer and Mamba Paper • 2503.24067 • Published Mar 31 • 21
MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use Paper • 2502.15872 • Published Feb 21 • 5
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam Paper • 2502.17055 • Published Feb 24 • 19