Zarra: Arabic Static Embedding Model
Zarra is a static embedding model built using the Model2Vec distillation framework. It is a distilled version of a Sentence Transformer, specifically optimized for the Arabic language. Unlike traditional transformer-based models, Zarra produces static embeddings, enabling ultra-fast inference on both CPU and GPU—making it ideal for resource-constrained environments or real-time applications.
Why Zarra?
⚡ Exceptional Speed: Delivers embeddings up to 500x faster than sentence transformers.
🧠 Compact & Efficient: Up to 50x smaller in size, allowing easy deployment on edge devices.
🧰 Versatile: Well-suited for search, clustering, classification, deduplication, and more.
🌍 Arabic-First: Specifically trained on high-quality Arabic data, ensuring relevance and performance across a range of Arabic NLP tasks.
About Model2Vec
The Model2Vec distillation technique transfers knowledge from large transformer models into lightweight static embedding spaces, preserving semantic quality while dramatically improving speed and efficiency. Zarra represents the best of both worlds: the semantic power of transformers and the speed and simplicity of static vectors.
Installation
Install model2vec using pip:
pip install model2vec
Usage
Using Model2Vec
The Model2Vec library is the fastest and most lightweight way to run Model2Vec models.
Load this model using the from_pretrained
method:
from model2vec import StaticModel
# Load a pretrained Model2Vec model
model = StaticModel.from_pretrained("NAMAA-Space/zarra")
# Compute text embeddings
embeddings = model.encode(["Example sentence"])
Using Sentence Transformers
You can also use the Sentence Transformers library to load and use the model:
from sentence_transformers import SentenceTransformer
# Load a pretrained Sentence Transformer model
model = SentenceTransformer("NAMAA-Space/zarra")
# Compute text embeddings
embeddings = model.encode(["Example sentence"])
How it Works
Model2vec creates a small, fast, and powerful model that outperforms other static embedding models by a large margin on all tasks we could find, while being much faster to create than traditional static embedding models such as GloVe. Best of all, you don't need any data to distill a model using Model2Vec.
It works by passing a vocabulary through a sentence transformer model, then reducing the dimensionality of the resulting embeddings using PCA, and finally weighting the embeddings using SIF weighting. During inference, we simply take the mean of all token embeddings occurring in a sentence.
Benchmark on Arabic
Speed
Model | Speed (sentences/second) | Device |
---|---|---|
zarra | 26893.63 | cpu |
bojji | 27478.15 | cpu |
potion-multilingual-128M | 27145.31 | cpu |
paraphrase-multilingual-MiniLM-L12-v2 | 2363.24 | cuda |
silma_ai_embedding_sts_v0.1 | 627.13 | cuda |
muffakir_embedding | 621.77 | cuda |
get_multilingual_base | 895.41 | cuda |
arabic_retrieval_v1.0 | 618.56 | cuda |
arabic_triplet_matryoshka_v2 | 610.64 | cuda |
Zarra and Bojji excel in speed, achieving 26893.63 and 27478.15 sentences per second on CPU, respectively, far surpassing CUDA-based models like arabic_triplet_matryoshka_v2 (610.64).
Top Performer: Bojji is the fastest model, slightly ahead of Zarra and potion-multilingual-128M (27145.31), highlighting the efficiency of Model2Vec-based models on CPU.
Key Observation: The high speed of Zarra and Bojji on CPU makes them ideal for resource-constrained environments, offering significant advantages over CUDA-dependent models.
Size of the Model
Model | Parameters (M) | Size (MB) | Relative to Largest (%) | Less than Largest (x) |
---|---|---|---|---|
zarra | 64.00 | 244.14 | 41.92 | 2.39 |
bojji | 124.88 | 476.40 | 81.79 | 1.22 |
potion-multilingual-128M | 128.09 | 488.63 | 83.89 | 1.19 |
paraphrase-multilingual-MiniLM-… | 117.65 | 448.82 | 77.06 | 1.30 |
silma_ai_embedding_sts_v0.1 | 135.19 | 515.72 | 88.54 | 1.13 |
muffakir_embedding | 135.19 | 515.72 | 88.54 | 1.13 |
arabic_retrieval_v1.0 | 135.19 | 515.73 | 88.54 | 1.13 |
arabic_triplet_matryoshka_v2 | 135.19 | 515.72 | 88.54 | 1.13 |
get_multilingual_base | 305.37 | 582.45 | 100.00 | 1.00 |
Zarra is the smallest model, with only 64 million parameters and 244.14 MB in size, making it 2.39 times smaller than the largest model (get_multilingual_base).
Bojji is slightly larger at 124.88 million parameters and 476.40 MB, but still significantly smaller than most other models.
Top Performer: Zarra leads in compactness, offering the smallest footprint, which is critical for deployment on resource-limited devices.
Key Observation: The compact size of Zarra and Bojji aligns with their design goal of efficiency, making them highly suitable for edge computing and real-time applications.
Model | Avg | MIRAC | MLQAR | Massi | Multi | STS17 | STS22 | XNLI_ |
---|---|---|---|---|---|---|---|---|
arabic_triplet_matryoshka_v2 | 0.6610 | 0.6262 | 0.5093 | 0.5577 | 0.5868 | 0.8531 | 0.6396 | 0.8542 |
muffakir_embedding | 0.6494 | 0.6424 | 0.5267 | 0.5462 | 0.5943 | 0.8485 | 0.6291 | 0.7583 |
arabic_retrieval_v1.0 | 0.6473 | 0.6159 | 0.5674 | 0.5832 | 0.5993 | 0.8002 | 0.6254 | 0.7393 |
gate_arabert-v1 | 0.6444 | 0.5774 | 0.4808 | 0.5345 | 0.5847 | 0.8278 | 0.6310 | 0.8746 |
get_multilingual_base | 0.6440 | 0.7177 | 0.5698 | 0.5071 | 0.5521 | 0.7881 | 0.6145 | 0.7584 |
arabic_sts_matryoshka | 0.6413 | 0.5828 | 0.4840 | 0.5457 | 0.5494 | 0.8290 | 0.6242 | 0.8740 |
silma_ai_embedding_sts_v0.1 | 0.6138 | 0.3799 | 0.5011 | 0.5600 | 0.5749 | 0.8559 | 0.6122 | 0.8125 |
Arabic-MiniLM-L12-v2-all-nli-triplet | 0.5431 | 0.2240 | 0.3612 | 0.4775 | 0.5698 | 0.8111 | 0.5540 | 0.8043 |
paraphrase-multilingual-MiniLM-L12-v2 | 0.5208 | 0.2191 | 0.3496 | 0.4515 | 0.5573 | 0.7916 | 0.4908 | 0.7859 |
bojji | 0.5177 | 0.2941 | 0.3989 | 0.4667 | 0.5433 | 0.7233 | 0.5880 | 0.6094 |
zarra | 0.4822 | 0.2295 | 0.3473 | 0.4119 | 0.5237 | 0.6469 | 0.6218 | 0.5942 |
potion-multilingual-128M | 0.4699 | 0.1658 | 0.3150 | 0.4285 | 0.5338 | 0.6511 | 0.5951 | 0.5999 |
all_minilm_l6_v2 | 0.2843 | 0.0005 | 0.0064 | 0.1905 | 0.4934 | 0.5089 | 0.2518 | 0.5384 |
Sorted by STS17_main (Score)
Model Name | STS17_main |
---|---|
silma_ai_embedding_sts_v0.1 | 0.856 |
arabic_triplet_matryoshka_v2 | 0.853 |
muffakir_embedding | 0.849 |
arabic_sts_matryoshka | 0.829 |
gate_arabert-v1 | 0.828 |
Arabic-MiniLM-L12-v2-all-nli-triplet | 0.811 |
arabic_retrieval_v1.0 | 0.800 |
paraphrase-multilingual-MiniLM-L12-v2 | 0.792 |
get_multilingual_base | 0.788 |
bojji | 0.723 |
potion-multilingual-128M | 0.651 |
zarra | 0.647 |
all_minilm_l6_v2 | 0.509 |
Sorted by STS22.v2_main (Score)
Model Name | STS22.v2_main |
---|---|
arabic_triplet_matryoshka_v2 | 0.640 |
gate_arabert-v1 | 0.631 |
muffakir_embedding | 0.629 |
arabic_retrieval_v1.0 | 0.625 |
arabic_sts_matryoshka | 0.624 |
zarra | 0.622 |
get_multilingual_base | 0.615 |
silma_ai_embedding_sts_v0.1 | 0.612 |
potion-multilingual-128M | 0.595 |
bojji | 0.588 |
Arabic-MiniLM-L12-v2-all-nli-triplet | 0.554 |
paraphrase-multilingual-MiniLM-L12-v2 | 0.491 |
all_minilm_l6_v2 | 0.252 |
Additional Resources
- Downloads last month
- 70
Model tree for NAMAA-Space/zarra
Base model
jinaai/jina-embeddings-v3