Zarra: Arabic Static Embedding Model

image/png

Zarra is a static embedding model built using the Model2Vec distillation framework. It is a distilled version of a Sentence Transformer, specifically optimized for the Arabic language. Unlike traditional transformer-based models, Zarra produces static embeddings, enabling ultra-fast inference on both CPU and GPU—making it ideal for resource-constrained environments or real-time applications.

Why Zarra?

⚡ Exceptional Speed: Delivers embeddings up to 500x faster than sentence transformers.

🧠 Compact & Efficient: Up to 50x smaller in size, allowing easy deployment on edge devices.

🧰 Versatile: Well-suited for search, clustering, classification, deduplication, and more.

🌍 Arabic-First: Specifically trained on high-quality Arabic data, ensuring relevance and performance across a range of Arabic NLP tasks.

Speed vs Performance Chart

About Model2Vec

The Model2Vec distillation technique transfers knowledge from large transformer models into lightweight static embedding spaces, preserving semantic quality while dramatically improving speed and efficiency. Zarra represents the best of both worlds: the semantic power of transformers and the speed and simplicity of static vectors.

Installation

Install model2vec using pip:

pip install model2vec

Usage

Using Model2Vec

The Model2Vec library is the fastest and most lightweight way to run Model2Vec models.

Load this model using the from_pretrained method:

from model2vec import StaticModel

# Load a pretrained Model2Vec model
model = StaticModel.from_pretrained("NAMAA-Space/zarra")

# Compute text embeddings
embeddings = model.encode(["Example sentence"])

Using Sentence Transformers

You can also use the Sentence Transformers library to load and use the model:

from sentence_transformers import SentenceTransformer

# Load a pretrained Sentence Transformer model
model = SentenceTransformer("NAMAA-Space/zarra")

# Compute text embeddings
embeddings = model.encode(["Example sentence"])

How it Works

Model2vec creates a small, fast, and powerful model that outperforms other static embedding models by a large margin on all tasks we could find, while being much faster to create than traditional static embedding models such as GloVe. Best of all, you don't need any data to distill a model using Model2Vec.

It works by passing a vocabulary through a sentence transformer model, then reducing the dimensionality of the resulting embeddings using PCA, and finally weighting the embeddings using SIF weighting. During inference, we simply take the mean of all token embeddings occurring in a sentence.

Benchmark on Arabic

Speed

Model Speed (sentences/second) Device
zarra 26893.63 cpu
bojji 27478.15 cpu
potion-multilingual-128M 27145.31 cpu
paraphrase-multilingual-MiniLM-L12-v2 2363.24 cuda
silma_ai_embedding_sts_v0.1 627.13 cuda
muffakir_embedding 621.77 cuda
get_multilingual_base 895.41 cuda
arabic_retrieval_v1.0 618.56 cuda
arabic_triplet_matryoshka_v2 610.64 cuda
  • Zarra and Bojji excel in speed, achieving 26893.63 and 27478.15 sentences per second on CPU, respectively, far surpassing CUDA-based models like arabic_triplet_matryoshka_v2 (610.64).

  • Top Performer: Bojji is the fastest model, slightly ahead of Zarra and potion-multilingual-128M (27145.31), highlighting the efficiency of Model2Vec-based models on CPU.

  • Key Observation: The high speed of Zarra and Bojji on CPU makes them ideal for resource-constrained environments, offering significant advantages over CUDA-dependent models.

Size of the Model

Model Parameters (M) Size (MB) Relative to Largest (%) Less than Largest (x)
zarra 64.00 244.14 41.92 2.39
bojji 124.88 476.40 81.79 1.22
potion-multilingual-128M 128.09 488.63 83.89 1.19
paraphrase-multilingual-MiniLM-… 117.65 448.82 77.06 1.30
silma_ai_embedding_sts_v0.1 135.19 515.72 88.54 1.13
muffakir_embedding 135.19 515.72 88.54 1.13
arabic_retrieval_v1.0 135.19 515.73 88.54 1.13
arabic_triplet_matryoshka_v2 135.19 515.72 88.54 1.13
get_multilingual_base 305.37 582.45 100.00 1.00
  • Zarra is the smallest model, with only 64 million parameters and 244.14 MB in size, making it 2.39 times smaller than the largest model (get_multilingual_base).

  • Bojji is slightly larger at 124.88 million parameters and 476.40 MB, but still significantly smaller than most other models.

  • Top Performer: Zarra leads in compactness, offering the smallest footprint, which is critical for deployment on resource-limited devices.

  • Key Observation: The compact size of Zarra and Bojji aligns with their design goal of efficiency, making them highly suitable for edge computing and real-time applications.

Model Avg MIRAC MLQAR Massi Multi STS17 STS22 XNLI_
arabic_triplet_matryoshka_v2 0.6610 0.6262 0.5093 0.5577 0.5868 0.8531 0.6396 0.8542
muffakir_embedding 0.6494 0.6424 0.5267 0.5462 0.5943 0.8485 0.6291 0.7583
arabic_retrieval_v1.0 0.6473 0.6159 0.5674 0.5832 0.5993 0.8002 0.6254 0.7393
gate_arabert-v1 0.6444 0.5774 0.4808 0.5345 0.5847 0.8278 0.6310 0.8746
get_multilingual_base 0.6440 0.7177 0.5698 0.5071 0.5521 0.7881 0.6145 0.7584
arabic_sts_matryoshka 0.6413 0.5828 0.4840 0.5457 0.5494 0.8290 0.6242 0.8740
silma_ai_embedding_sts_v0.1 0.6138 0.3799 0.5011 0.5600 0.5749 0.8559 0.6122 0.8125
Arabic-MiniLM-L12-v2-all-nli-triplet 0.5431 0.2240 0.3612 0.4775 0.5698 0.8111 0.5540 0.8043
paraphrase-multilingual-MiniLM-L12-v2 0.5208 0.2191 0.3496 0.4515 0.5573 0.7916 0.4908 0.7859
bojji 0.5177 0.2941 0.3989 0.4667 0.5433 0.7233 0.5880 0.6094
zarra 0.4822 0.2295 0.3473 0.4119 0.5237 0.6469 0.6218 0.5942
potion-multilingual-128M 0.4699 0.1658 0.3150 0.4285 0.5338 0.6511 0.5951 0.5999
all_minilm_l6_v2 0.2843 0.0005 0.0064 0.1905 0.4934 0.5089 0.2518 0.5384

Sorted by STS17_main (Score)

Model Name STS17_main
silma_ai_embedding_sts_v0.1 0.856
arabic_triplet_matryoshka_v2 0.853
muffakir_embedding 0.849
arabic_sts_matryoshka 0.829
gate_arabert-v1 0.828
Arabic-MiniLM-L12-v2-all-nli-triplet 0.811
arabic_retrieval_v1.0 0.800
paraphrase-multilingual-MiniLM-L12-v2 0.792
get_multilingual_base 0.788
bojji 0.723
potion-multilingual-128M 0.651
zarra 0.647
all_minilm_l6_v2 0.509

Sorted by STS22.v2_main (Score)

Model Name STS22.v2_main
arabic_triplet_matryoshka_v2 0.640
gate_arabert-v1 0.631
muffakir_embedding 0.629
arabic_retrieval_v1.0 0.625
arabic_sts_matryoshka 0.624
zarra 0.622
get_multilingual_base 0.615
silma_ai_embedding_sts_v0.1 0.612
potion-multilingual-128M 0.595
bojji 0.588
Arabic-MiniLM-L12-v2-all-nli-triplet 0.554
paraphrase-multilingual-MiniLM-L12-v2 0.491
all_minilm_l6_v2 0.252

Additional Resources

Downloads last month
70
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NAMAA-Space/zarra

Finetuned
(28)
this model

Dataset used to train NAMAA-Space/zarra

Collection including NAMAA-Space/zarra