TL;DR
Model Details
Training Details
Usage
Evaluation
Citation

TL;DR

Model Details

Model Description

Developed by: https://www.tii.ae
Model type: Causal decoder-only
Architecture: Pure-transformer - 1.58bit version
Language(s) (NLP): Mainly English
License: TII Falcon License 2.0

Training details

The model has been trained following the training strategies from the recent 1-bit LLM HF blogpost and 1-bit LLM paper. For more details about the training protocol of this model, please refer to the Falcon-3 technical report, section Compression.

Usage

Currently to use this model you can either rely on Hugging Face transformers library or BitNet library. You can also play with the model using the falcon-1.58bit playground (only for the 7B instruct version).

🤗 transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "tiiuae/Falcon3-7B-Base-1.58bit"

model = AutoModelForCausalLM.from_pretrained(
  model_id,
  torch_dtype=torch.bfloat16,
).to("cuda")

# Perform text generation

BitNet

git clone https://github.com/microsoft/BitNet && cd BitNet
pip install -r requirements.txt
python setup_env.py --hf-repo tiiuae/Falcon3-7B-Base-1.58bit -q i2_s
python run_inference.py -m models/Falcon3-7B-1.58bit/ggml-model-i2_s.gguf -p "Hi how are you doing today?" -cnv

Evaluation

We report in the following table our internal pipeline benchmarks:

Note evaluation results are normalized score from v2 leaderboard tasks - reported results of original models in the blogpost are raw scores

Benchmark	Llama3-8B-1.58-100B-tokens	Falcon3-7B-Base-1.58bit
IFEval	17.91	25.43
MUSR	4.87	5.75
GPQA	1.83	2.32
BBH	5.36	3.91
MMLU-PRO	2.78	1.36
MATH	0.26	0.88
Average	5.5	6.61

Citation

@misc{Falcon3,
    title = {The Falcon 3 Family of Open Models},
    url = {https://huggingface.co/blog/falcon3},
    author = {Falcon-LLM Team},
    month = {December},
    year = {2024}
}

tiiuae
/

Falcon3-7B-Base-1.58bit

Table of Contents