Model Summary

Aramis-2B-BitNet (2.41B params / Context Length: Maximum sequence length of 4096 tokens)
A compact, agent-oriented small language model focused on language understanding and contextual decision-making. Built with an iterative post-training recipe: bilingual DPO (FR+EN) + model merging of FR-centric and EN-centric variants. Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lossless to the BF16 checkpoint.

Why BitNet (and why this model)

BitNet b1.58 uses ternary weights (−1,0,+1) with abs-mean scaling : very low memory & energy, great CPU/edge throughput, unlike classic FP/INT SLMs. For more details on the underlying architecture and efficiency of BitNet, please refer to the official Microsoft Research publication: BitNet b1.58 2B4T Technical Report
Aramis demonstrates that a 2B BitNet can deliver SOTA language understanding in its class without sacrificing efficiency.

Model Variants

jpacifico/Aramis-2B-BitNet-bf16 (this repo): Contains the retrainable weights in BF16 format
jpacifico/Aramis-2B-BitNet-b1.58-i2s-GGUF : Quantized 1.58-bit GGUF version, you can use with bitnet.cpp

Training Recipe

Base model : microsoft/bitnet-b1.58-2B-4T-bf16
Post-Training Goal: agent-oriented behavior → better instruction following, contextual disambiguation, and pragmatic reasoning in multi-turn settings.

Iterative DPO + Model merging :

Bilingual DPO (FR+EN) to sharpen preference selection across two languages, using the following datasets :
jpacifico/french-orca-dpo-pairs-revised
Intel/orca_dpo_pairs
Model merging (ModelStock and TIES methods, via Mergekit to combine complementary strengths of bilingual models (FR-centric + EN-centric), improving robustness across reasoning and comprehension tasks while maintaining stability.

First benchmarks

Interpretation: Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the bitnet-b1.58-2B-4T quantized baseline (58,38). All scores are reported in comparison with the original microsoft/bitnet-b1.58-2B-4T-bf16 model.

Benchmark (metric)	microsoft/bitnet-b1.58-2B-4T-bf16	jpacifico/Aramis-2B-BitNet-bf16
arc_challenge 0 shot	47.95	51.62
arc_easy 0 shot	73.44	75.25
hellaswag 0 shot	68.27	68.52
openbookqa 0 shot	41.6	41.4
boolq 0 shot	79.39	79.33
piqa 0 shot	77.86	77.53
winogrande 0 shot	70.64	72.06
ifeval 0 shot	41.85	44.12
triviaqa 0 shot	11.95	15.06
triviaqa 5 shot EM	33.51	33.51
truthfulqa_mc2 10 shot	45.89	46.52
gsm8k 4 shot EM	62.4	59.67
mmlu 5 shot acc	52.96	53.39
commonsense_qa 10 shot acc	71.17	70.76

ARC-Challenge: 51.62 (First-ever ≥50 score for a model in the 2B category, i.e., >1.5B and <2.5B params)

Model	arc_challenge (0 shot)
Qwen/Qwen3-1.7B	43
ibm-granite/granite-3.3-2b-base	44,54
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	34,9
openbmb/MiniCPM-2B-dpo-bf16	44,28
microsoft/bitnet-b1.58-2B-4T-bf16 (base model)	47,95
microsoft/bitnet-b1.58-2B-4T	49,91
jpacifico/Aramis-2B-BitNet-bf16	51,62

Reproducibility

All benchmark results reported here were obtained using LM Eval Harness.
The following example reproduces the ARC-Challenge (0-shot) evaluation for this model:

HF_ALLOW_CODE_EVAL=1 lm-eval --model hf \
  --model_args pretrained=jpacifico/Aramis-2B-BitNet-bf16,dtype=bfloat16 \
  --tasks arc_challenge \
  --device cuda:0 --batch_size 8 \
  --seed 42 \
  --num_fewshot 0 \
  --confirm_run_unsafe_code \
  --trust_remote_code

All results were computed with LM Eval Harness v0.4.9
Randomness (e.g. seeds, batch sizes) may cause slight variations in results
The same procedure was used to evaluate all tasks presented in the benchmark tables

Usage with `bitnet.cpp`

You can run this model using my demo Colab notebook TBD

Please refer to the bitnet.cpp GitHub repository for detailed compilation steps, usage examples, and command-line options.

Last checkpoint

Merge Method

This model was merged using the Model Stock merge method using jpacifico/bitnet-dpo-merged-modelstock-retrain as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: jpacifico/bitnet-dpo-merged-ties2
  - model: jpacifico/bitnet-dpo-merged-modelstock2
  - model: jpacifico/bitnet-dpo-ties-retrained-mirror2
  - model: jpacifico/bitnet-dpo-merged-modelstock-retrain
 
merge_method: model_stock
base_model: jpacifico/bitnet-dpo-merged-modelstock-retrain
parameters:
    normalize: true
dtype: bfloat16
tokenizer_source: jpacifico/bitnet-dpo-merged-modelstock-retrain

Limitations

Not tuned for coding or formal math; prefer specialized variants if those are critical.
No explicit chain-of-thought training; improvements come from bilingual DPO + merging.

Disclamer
This model is intended for research and development purposes only and should not be used in commercial or real-world applications without further testing. While the Microsoft Research team has applied SFT and DPO to align the BitNet base model, it may still produce unexpected, biased, or inaccurate outputs. Please use responsibly.

Developed by: Jonathan Pacifico, 2025
Model type: LLM
Language(s) (NLP): French, English
License: MIT

Made with ❤️ in France

jpacifico
/

Aramis-2B-BitNet-bf16

Model Summary

Training Recipe

First benchmarks

Reproducibility

Usage with `bitnet.cpp`

Last checkpoint

Merge Method

Models Merged

Configuration

Limitations

Model tree for jpacifico/Aramis-2B-BitNet-bf16

Datasets used to train jpacifico/Aramis-2B-BitNet-bf16

Model Summary

Training Recipe

First benchmarks

Reproducibility

Usage with bitnet.cpp

Last checkpoint

Merge Method

Models Merged

Configuration

Limitations

Model tree for jpacifico/Aramis-2B-BitNet-bf16

Datasets used to train jpacifico/Aramis-2B-BitNet-bf16

Usage with `bitnet.cpp`