BitMamba-2-1B

Paper GitHub

BitMamba-2-1B is a scalable, hybrid architecture that integrates 1.58-bit ternary quantization (BitNet) into the Mamba-2 state space model framework. Trained from scratch on 150B tokens of high-quality data, it demonstrates that ternary SSMs follow predictable scaling laws, achieving competitive reasoning capabilities with a drastically reduced memory footprint.

⚑ Key Features

  • Architecture: Mamba-2 SSM + BitNet b1.58 (Ternary Weights).
  • Parameters: 1B.
  • Precision: 1.58-bit (weights {-1, 0, 1}).
  • Training Tokens: 150 Billion (FineWeb-Edu, Cosmopedia, Stack-Dedup).
  • Hardware: Trained on Google Cloud TPU v6e.

πŸ“Š Benchmark Results

Benchmark Metric BitMamba-2-1B vs. 255M Baseline
ARC-Easy Accuracy 63.30% +7.8%
PIQA Accuracy 68.77% +4.4%
BoolQ Accuracy 62.35% +3.1%
HellaSwag Acc Norm 45.59% +10.4%
WikiText-2 Perplexity 29.62 -22.1

Scaling from 255M to 1B parameters yields consistent improvements...

Scaling Laws

πŸš€ Usage (Inference)

This model is optimized for edge deployment using our custom C++ inference engine.

1. Download the Quantized Model

Download the bitmamba_1b.bin file located in the files tab (or bitmamba_cpp folder).

2. Run with C++

Go to our GitHub Repository to get the inference code.

# Example usage after compiling bitmamba.cpp
./bitmamba bitmamba_1b.bin "15496 11 314 716" 0.7 1.1 0.05 0.9 40 200

3. JAX/Flax Usage

The bitmamba_1b.msgpack contains the raw JAX weights for research purposes. You can load them using the source code provided in src/ on GitHub.

πŸ› οΈ Efficient Deployment

Running on a consumer Intel Core i3-12100F CPU:

Model RAM Usage Speed
BitMamba-2-1B 621 MB ~53 tok/s

πŸ“œ Citation

@misc{salazar2026bitmamba2,
  author       = {Salazar, Jesus},
  title        = {BitMamba-2: Efficient Scaling of 1.58-bit State Space Models},
  year         = {2026},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.18394665},
  url          = {[https://doi.org/10.5281/zenodo.18394665](https://doi.org/10.5281/zenodo.18394665)}
}
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train Zhayr1/BitMamba-2-1B