Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,167 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
library_name: transformers
|
3 |
+
tags:
|
4 |
+
- bitnet
|
5 |
+
- falcon-e
|
6 |
+
license: other
|
7 |
+
license_name: falcon-llm-license
|
8 |
+
license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
|
9 |
+
---
|
10 |
+
|
11 |
+

|
12 |
+
|
13 |
+
# Table of Contents
|
14 |
+
|
15 |
+
0. [TL;DR](#TL;DR)
|
16 |
+
1. [Model Details](#model-details)
|
17 |
+
2. [Training Details](#training-details)
|
18 |
+
3. [Usage](#usage)
|
19 |
+
4. [Evaluation](#evaluation)
|
20 |
+
5. [Citation](#citation)
|
21 |
+
|
22 |
+
|
23 |
+
# TL;DR
|
24 |
+
|
25 |
+
# Model Details
|
26 |
+
|
27 |
+
## Model Description
|
28 |
+
|
29 |
+
- **Developed by:** [https://www.tii.ae](https://www.tii.ae)
|
30 |
+
- **Model type:** Causal decoder-only / Base version
|
31 |
+
- **Architecture:** Pure-transformer - 1.58bit version
|
32 |
+
- **Language(s) (NLP):** English
|
33 |
+
- **License:** Falcon-LLM License
|
34 |
+
|
35 |
+
# Training details
|
36 |
+
|
37 |
+
For more details about the training protocol of this model, please refer to the [Falcon-E technical blogpost](https://falcon-lm.github.io/blog/falcon-edge/).
|
38 |
+
|
39 |
+
# Usage
|
40 |
+
|
41 |
+
Currently to use this model you can either rely on Hugging Face transformers library or [BitNet](https://github.com/microsoft/BitNet) library. There are multiple ways to interact with the model depending on your target usage. For each of the Falcon-E series model, you have three variants: the BitNet model, the prequantized checkpoint for fine-tuning and the `bfloat16` version of the BitNet model.
|
42 |
+
|
43 |
+
### Inference
|
44 |
+
|
45 |
+
#### BitNet
|
46 |
+
|
47 |
+
```
|
48 |
+
git clone https://github.com/microsoft/BitNet && cd BitNet
|
49 |
+
pip install -r requirements.txt
|
50 |
+
huggingface-cli download tiiuae/Falcon-E-3B-Instruct-GGUF ggml-model-i2_s.gguf --local-dir models/Falcon-E-3B-Instruct/
|
51 |
+
python run_inference.py -m models/Falcon-E-3B-Instruct/ggml-model-i2_s.gguf -p "You are a helpful assistant" -cnv
|
52 |
+
```
|
53 |
+
|
54 |
+
### Fine-tuning
|
55 |
+
|
56 |
+
For fine-tuning the model, you should load the `prequantized` revision of the model and use the `onebitllms` Python package:
|
57 |
+
|
58 |
+
```diff
|
59 |
+
import torch
|
60 |
+
|
61 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
62 |
+
from trl import SFTTrainer
|
63 |
+
+ from onebitllms import replace_linear_with_bitnet_linear, quantize_to_1bit
|
64 |
+
|
65 |
+
model_id = "tiiuae/Falcon-E-1B-Base"
|
66 |
+
|
67 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id, revision="prequantized")
|
68 |
+
model = AutoModelForCausalLM.from_pretrained(
|
69 |
+
model_id,
|
70 |
+
torch_dtype=torch.bfloat16,
|
71 |
+
+ revision="prequantized"
|
72 |
+
)
|
73 |
+
+ model = replace_linear_with_bitnet_linear(model)
|
74 |
+
|
75 |
+
trainer = SFTTrainer(
|
76 |
+
model,
|
77 |
+
...
|
78 |
+
)
|
79 |
+
|
80 |
+
trainer.train()
|
81 |
+
|
82 |
+
+ quantize_to_1bit(output_directory)
|
83 |
+
```
|
84 |
+
|
85 |
+
# Evaluation
|
86 |
+
|
87 |
+
We report in the following table our internal pipeline benchmarks:
|
88 |
+
|
89 |
+
**Note evaluation results are normalized score from former Hugging Face leaderboard v2 tasks**
|
90 |
+
|
91 |
+
<details>
|
92 |
+
<summary class="bold"> For 1B scale models and below </summary>
|
93 |
+
|
94 |
+
| Model | Nb Params | Mem Footprint | IFEVAL | Math-Hard | GPQA | MuSR | BBH | MMLU-Pro | Avg. |
|
95 |
+
| -------- | ------- | ------- | ------- | ------ | ----- | ----- | ----- | ------ | ---- |
|
96 |
+
| Qwen-2.5-0.5B | 0.5B | 1GB | 16.27 | 3.93 | 0.0 | 2.08 | 6.95 | 10.06 | 6.55 |
|
97 |
+
| SmolLM2-360M | 0.36B | 720MB | 21.15 | 1.21 | 0.0 | 7.73 | 5.54 | 1.88 | 6.25 |
|
98 |
+
| Qwen-2.5-1.5B | 1.5B | 3.1GB | 26.74 | 9.14 | 16.66 | 5.27 | 20.61 | 4.7 | 13.85 |
|
99 |
+
| Llama-3.2-1B | 1.24B | 2.47GB | 14.78 | 1.21 | 4.37 | 2.56 | 2.26 | 0 | 4.2 |
|
100 |
+
| SmolLM2-1.7B | 1.7B | 3.4GB | 24.4 | 2.64 | 9.3 | 4.6 | 12.64 | 3.91 | 9.58 |
|
101 |
+
| Falcon-3-1B-Base | 1.5B | 3GB | 24.28 | 3.32 | 11.34 | 9.71 | 6.76 | 3.91 | 9.89 |
|
102 |
+
| Hymba-1.5B-Base | 1.5B | 3GB | 22.95 | 1.36 | 7.69 | 5.18 | 10.25 | 0.78 | 8.04 |
|
103 |
+
| Falcon-E-1B-Base | 1.8B | **635MB** | 32.9 | 10.97 | 2.8 | 3.65 | 12.28 | 17.82 | 13.40 |
|
104 |
+
|
105 |
+
</details>
|
106 |
+
|
107 |
+
|
108 |
+
<details>
|
109 |
+
<summary class="bold"> For 3B scale models </summary>
|
110 |
+
|
111 |
+
| Model | Nb Params | Mem Footprint | IFEVAL | Math-Hard | GPQA | MuSR | BBH | MMLU-Pro | Avg. |
|
112 |
+
| -------- | ------- | ------- | ------- | ------ | ----- | ----- | ----- | ------ | ---- |
|
113 |
+
| Falcon-3-3B-Base | 3B | 6.46GB | 15.74 | 11.78 | 21.58 | 6.27 | 18.09 | 6.26 | 15.74 |
|
114 |
+
| Qwen2.5-3B | 3B | 6.17GB | 26.9 | 14.8 | 24.3 | 11.76 | 24.48 | 6.38 | 18.1 |
|
115 |
+
| Falcon-E-3B-Base | 3B | **955MB** | 36.67 | 13.45 | 8.67 | 4.14 | 19.83 | 27.16 | 18.32 |
|
116 |
+
|
117 |
+
</details>
|
118 |
+
|
119 |
+
Below are the results for instruction fine-tuned models:
|
120 |
+
|
121 |
+
<details>
|
122 |
+
<summary class="bold"> For 1B scale models and below </summary>
|
123 |
+
|
124 |
+
| Model | Nb Params | Mem Footprint | IFEVAL | Math-Hard | GPQA | MuSR | BBH | MMLU-Pro | Avg. |
|
125 |
+
| -------- | ------- | ------- | ------- | ------ | ----- | ----- | ----- | ------ | ---- |
|
126 |
+
| Qwen-2.5-0.5B-Instruct | 500M | 1GB | 30.71 | 0 | 8.43 | 0.94 | 7.75 | 0 | 6.59 |
|
127 |
+
| SmolLM2-360M-Instruct | 360M | 720MB | 38.42 | 1.51 | 4.17 | 2.77 | 1.3 | 0.67 | 8.14 |
|
128 |
+
| Qwen-2.5-1.5B-Instruct | 1.5B | 3.1GB | 44.76 | 22.05 | 19.81 | 3.19 | 19.99 | 0.78 | 18.43 |
|
129 |
+
| SmolLM2-1.7B | 1.7B | 3.4GB | 53.68 | 5.82 | 10.92 | 4.1 | 11.71 | 0 | 15.02 |
|
130 |
+
| Falcon-3-1B-Instruct | 1.5B | 3GB | 55.57 | 6.34 | 12.96 | 10.56 | 9.32 | 2.24 | 16.16 |
|
131 |
+
| Hymba-1.5B-Instruct | 1.5B | 3GB | 60.09 | 2.72 | 4.59 | 1.05 | 11.56 | 5.515 | 14.19 |
|
132 |
+
| Falcon-E-1B-Instruct | 1.8B | **635MB** | 54.35 | 9.12 | 16.5 | 2.51 | 19.42 | 9.64 | 18.59 |
|
133 |
+
|
134 |
+
</details>
|
135 |
+
|
136 |
+
|
137 |
+
<details>
|
138 |
+
<summary class="bold"> For 3B scale models </summary>
|
139 |
+
|
140 |
+
| Model | Nb Params | Mem Footprint | IFEVAL | Math-Hard | GPQA | MuSR | BBH | MMLU-Pro | Avg. |
|
141 |
+
| -------- | ------- | ------- | ------- | ------ | ----- | ----- | ----- | ------ | ---- |
|
142 |
+
| Falcon-3-3B-Instruct | 3B | 6.46GB | 69.77 | 25 | 26.29 | 11.13 | 22.28 | 5.15 | 26.6 |
|
143 |
+
| Qwen2.5-3B-Instruct | 3B | 6.17GB | 64.75 | 36.78 | 25.8 | 7.57 | 25.05 | 3.02 | 27.16 |
|
144 |
+
| Falcon-E-3B-Instruct | 3B | **955MB** | 60.97 | 15.3 | 23.59 | 2.12 | 26.45 | 7.45 | 22.64666667 |
|
145 |
+
|
146 |
+
</details>
|
147 |
+
|
148 |
+
|
149 |
+
## Useful links
|
150 |
+
|
151 |
+
- View [our release blogpost](https://falcon-lm.github.io/blog/falcon-edge/).
|
152 |
+
- Learn more about [`onebitllms` library](https://github.com/tiiuae/onebitllms).
|
153 |
+
- Feel free to join [our discord server](https://discord.gg/fwXpMyGc) if you have any questions or to interact with our researchers and developers.
|
154 |
+
|
155 |
+
## Citation
|
156 |
+
|
157 |
+
If the Falcon-E family of models were helpful to your work, feel free to give us a cite.
|
158 |
+
|
159 |
+
```
|
160 |
+
@misc{tiionebitllms,
|
161 |
+
title = {Falcon-E, a series of powerful, universal and fine-tunable 1.58bit language models.},
|
162 |
+
author = {Falcon-LLM Team},
|
163 |
+
month = {April},
|
164 |
+
url = {https://falcon-lm.github.io/blog/falcon-edge},
|
165 |
+
year = {2025}
|
166 |
+
}
|
167 |
+
```
|