BitMamba-2-1B
BitMamba-2-1B is a scalable, hybrid architecture that integrates 1.58-bit ternary quantization (BitNet) into the Mamba-2 state space model framework. Trained from scratch on 150B tokens of high-quality data, it demonstrates that ternary SSMs follow predictable scaling laws, achieving competitive reasoning capabilities with a drastically reduced memory footprint.
β‘ Key Features
- Architecture: Mamba-2 SSM + BitNet b1.58 (Ternary Weights).
- Parameters: 1B.
- Precision: 1.58-bit (weights {-1, 0, 1}).
- Training Tokens: 150 Billion (FineWeb-Edu, Cosmopedia, Stack-Dedup).
- Hardware: Trained on Google Cloud TPU v6e.
π Benchmark Results
| Benchmark | Metric | BitMamba-2-1B | vs. 255M Baseline |
|---|---|---|---|
| ARC-Easy | Accuracy | 63.30% | +7.8% |
| PIQA | Accuracy | 68.77% | +4.4% |
| BoolQ | Accuracy | 62.35% | +3.1% |
| HellaSwag | Acc Norm | 45.59% | +10.4% |
| WikiText-2 | Perplexity | 29.62 | -22.1 |
Scaling from 255M to 1B parameters yields consistent improvements...
π Usage (Inference)
This model is optimized for edge deployment using our custom C++ inference engine.
1. Download the Quantized Model
Download the bitmamba_1b.bin file located in the files tab (or bitmamba_cpp folder).
2. Run with C++
Go to our GitHub Repository to get the inference code.
# Example usage after compiling bitmamba.cpp
./bitmamba bitmamba_1b.bin "15496 11 314 716" 0.7 1.1 0.05 0.9 40 200
3. JAX/Flax Usage
The bitmamba_1b.msgpack contains the raw JAX weights for research purposes. You can load them using the source code provided in src/ on GitHub.
π οΈ Efficient Deployment
Running on a consumer Intel Core i3-12100F CPU:
| Model | RAM Usage | Speed |
|---|---|---|
| BitMamba-2-1B | 621 MB | ~53 tok/s |
π Citation
@misc{salazar2026bitmamba2,
author = {Salazar, Jesus},
title = {BitMamba-2: Efficient Scaling of 1.58-bit State Space Models},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.18394665},
url = {[https://doi.org/10.5281/zenodo.18394665](https://doi.org/10.5281/zenodo.18394665)}
}
- Downloads last month
- 6
