image/png

Table of Contents

  1. TL;DR
  2. Model Details
  3. Training Details
  4. Usage
  5. Evaluation
  6. Citation

TL;DR

Model Details

Model Description

  • Developed by: https://www.tii.ae
  • Model type: Causal decoder-only / Base version
  • Architecture: Pure-transformer - 1.58bit version
  • Language(s) (NLP): English
  • License: Falcon-LLM License

Training details

For more details about the training protocol of this model, please refer to the Falcon-E technical blogpost.

Usage

Currently to use this model you can either rely on Hugging Face transformers library or BitNet library. There are multiple ways to interact with the model depending on your target usage. For each of the Falcon-E series model, you have three variants: the BitNet model, the prequantized checkpoint for fine-tuning and the bfloat16 version of the BitNet model.

Inference

BitNet

git clone https://github.com/microsoft/BitNet && cd BitNet
pip install -r requirements.txt
huggingface-cli download tiiuae/Falcon-E-3B-Instruct-GGUF ggml-model-i2_s.gguf --local-dir models/Falcon-E-3B-Instruct/
python run_inference.py -m models/Falcon-E-3B-Instruct/ggml-model-i2_s.gguf -p "You are a helpful assistant" -cnv

Fine-tuning

For fine-tuning the model, you should load the prequantized revision of the model and use the onebitllms Python package:

import torch

from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import SFTTrainer
+ from onebitllms import replace_linear_with_bitnet_linear, quantize_to_1bit

model_id = "tiiuae/Falcon-E-1B-Base"

tokenizer = AutoTokenizer.from_pretrained(model_id, revision="prequantized")
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
+    revision="prequantized"
)
+ model = replace_linear_with_bitnet_linear(model)

trainer = SFTTrainer(
    model,
    ...
)

trainer.train()

+ quantize_to_1bit(output_directory)

Evaluation

We report in the following table our internal pipeline benchmarks:

Note evaluation results are normalized score from former Hugging Face leaderboard v2 tasks

For 1B scale models and below
Model Nb Params Mem Footprint IFEVAL Math-Hard GPQA MuSR BBH MMLU-Pro Avg.
Qwen-2.5-0.5B 0.5B 1GB 16.27 3.93 0.0 2.08 6.95 10.06 6.55
SmolLM2-360M 0.36B 720MB 21.15 1.21 0.0 7.73 5.54 1.88 6.25
Qwen-2.5-1.5B 1.5B 3.1GB 26.74 9.14 16.66 5.27 20.61 4.7 13.85
Llama-3.2-1B 1.24B 2.47GB 14.78 1.21 4.37 2.56 2.26 0 4.2
SmolLM2-1.7B 1.7B 3.4GB 24.4 2.64 9.3 4.6 12.64 3.91 9.58
Falcon-3-1B-Base 1.5B 3GB 24.28 3.32 11.34 9.71 6.76 3.91 9.89
Hymba-1.5B-Base 1.5B 3GB 22.95 1.36 7.69 5.18 10.25 0.78 8.04
Falcon-E-1B-Base 1.8B 635MB 32.9 10.97 2.8 3.65 12.28 17.82 13.40
For 3B scale models
Model Nb Params Mem Footprint IFEVAL Math-Hard GPQA MuSR BBH MMLU-Pro Avg.
Falcon-3-3B-Base 3B 6.46GB 15.74 11.78 21.58 6.27 18.09 6.26 15.74
Qwen2.5-3B 3B 6.17GB 26.9 14.8 24.3 11.76 24.48 6.38 18.1
Falcon-E-3B-Base 3B 955MB 36.67 13.45 8.67 4.14 19.83 27.16 18.32

Below are the results for instruction fine-tuned models:

For 1B scale models and below
Model Nb Params Mem Footprint IFEVAL Math-Hard GPQA MuSR BBH MMLU-Pro Avg.
Qwen-2.5-0.5B-Instruct 500M 1GB 30.71 0 8.43 0.94 7.75 0 6.59
SmolLM2-360M-Instruct 360M 720MB 38.42 1.51 4.17 2.77 1.3 0.67 8.14
Qwen-2.5-1.5B-Instruct 1.5B 3.1GB 44.76 22.05 19.81 3.19 19.99 0.78 18.43
SmolLM2-1.7B 1.7B 3.4GB 53.68 5.82 10.92 4.1 11.71 0 15.02
Falcon-3-1B-Instruct 1.5B 3GB 55.57 6.34 12.96 10.56 9.32 2.24 16.16
Hymba-1.5B-Instruct 1.5B 3GB 60.09 2.72 4.59 1.05 11.56 5.515 14.19
Falcon-E-1B-Instruct 1.8B 635MB 54.35 9.12 16.5 2.51 19.42 9.64 18.59
For 3B scale models
Model Nb Params Mem Footprint IFEVAL Math-Hard GPQA MuSR BBH MMLU-Pro Avg.
Falcon-3-3B-Instruct 3B 6.46GB 69.77 25 26.29 11.13 22.28 5.15 26.6
Qwen2.5-3B-Instruct 3B 6.17GB 64.75 36.78 25.8 7.57 25.05 3.02 27.16
Falcon-E-3B-Instruct 3B 955MB 60.97 15.3 23.59 2.12 26.45 7.45 22.64666667

Useful links

Citation

If the Falcon-E family of models were helpful to your work, feel free to give us a cite.

@misc{tiionebitllms,
    title = {Falcon-E, a series of powerful, universal and fine-tunable 1.58bit language models.},
    author = {Falcon-LLM Team},
    month = {April},
    url = {https://falcon-lm.github.io/blog/falcon-edge},
    year = {2025}
}
Downloads last month
52
GGUF
Model size
3.05B params
Architecture
llama
Hardware compatibility
Log In to view the estimation
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Collection including tiiuae/Falcon-E-3B-Instruct-GGUF