Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models? Paper • 2502.11895 • Published Feb 17 • 3
DeToNATION: Decoupled Torch Network-Aware Training on Interlinked Online Nodes Paper • 2502.06728 • Published Feb 10
Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve Aerial Visual Perception? Paper • 2312.04548 • Published Dec 7, 2023 • 1
BitNet b1.58 Reloaded: State-of-the-art Performance Also on Smaller Networks Paper • 2407.09527 • Published Jun 24, 2024
When are 1.58 bits enough? A Bottom-up Exploration of BitNet Quantization Paper • 2411.05882 • Published Nov 8, 2024 • 1