BitMamba-2-1B

BitMamba-2-1B is a scalable, hybrid architecture that integrates 1.58-bit ternary quantization (BitNet) into the Mamba-2 state space model framework. Trained from scratch on 150B tokens of high-quality data, it demonstrates that ternary SSMs follow predictable scaling laws, achieving competitive reasoning capabilities with a drastically reduced memory footprint.

⚡ Key Features

Architecture: Mamba-2 SSM + BitNet b1.58 (Ternary Weights).
Parameters: 1B.
Precision: 1.58-bit (weights {-1, 0, 1}).
Training Tokens: 150 Billion (FineWeb-Edu, Cosmopedia, Stack-Dedup).
Hardware: Trained on Google Cloud TPU v6e.

📊 Benchmark Results

Benchmark	Metric	BitMamba-2-1B	vs. 255M Baseline
ARC-Easy	Accuracy	63.30%	+7.8%
PIQA	Accuracy	68.77%	+4.4%
BoolQ	Accuracy	62.35%	+3.1%
HellaSwag	Acc Norm	45.59%	+10.4%
WikiText-2	Perplexity	29.62	-22.1

Scaling from 255M to 1B parameters yields consistent improvements...

🚀 Usage (Inference)

This model is optimized for edge deployment using our custom C++ inference engine.

1. Download the Quantized Model

Download the bitmamba_1b.bin file located in the files tab (or bitmamba_cpp folder).

2. Run with C++

Go to our GitHub Repository to get the inference code.

# Example usage after compiling bitmamba.cpp
./bitmamba bitmamba_1b.bin "15496 11 314 716" 0.7 1.1 0.05 0.9 40 200

3. JAX/Flax Usage

The bitmamba_1b.msgpack contains the raw JAX weights for research purposes. You can load them using the source code provided in src/ on GitHub.

🛠️ Efficient Deployment

Running on a consumer Intel Core i3-12100F CPU:

Model	RAM Usage	Speed
BitMamba-2-1B	621 MB	~53 tok/s

📜 Citation

@misc{salazar2026bitmamba2,
  author       = {Salazar, Jesus},
  title        = {BitMamba-2: Efficient Scaling of 1.58-bit State Space Models},
  year         = {2026},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.18394665},
  url          = {[https://doi.org/10.5281/zenodo.18394665](https://doi.org/10.5281/zenodo.18394665)}
}

Downloads last month: 6

Zhayr1
/

BitMamba-2-1B