GATE-AraBert-v0

This is a General Arabic Text Embedding trained using SentenceTransformers in a multi-task setup. The system trains on the AllNLI and on the STS dataset.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity
Training Datasets:
- all-nli
- sts
Language: ar

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Omartificial-Intelligence-Space/GATE-AraBert-v0")
# Run inference
sentences = [
    'الكلب البني مستلقي على جانبه على سجادة بيج، مع جسم أخضر في المقدمة.',
    'لقد مات الكلب',
    'شخص طويل القامة',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Dataset: sts-dev
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.8384
spearman_cosine	0.8389
pearson_manhattan	0.8248
spearman_manhattan	0.8329
pearson_euclidean	0.825
spearman_euclidean	0.8337
pearson_dot	0.8072
spearman_dot	0.8098
pearson_max	0.8384
spearman_max	0.8389

Semantic Similarity

Dataset: sts-test
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.7908
spearman_cosine	0.7893
pearson_manhattan	0.7923
spearman_manhattan	0.7947
pearson_euclidean	0.7904
spearman_euclidean	0.7934
pearson_dot	0.7404
spearman_dot	0.7354
pearson_max	0.7923
spearman_max	0.7947

Omartificial-Intelligence-Space
/

GATE-AraBert-v0

GATE-AraBert-v0

Model Details

Model Description

Usage

Direct Usage (Sentence Transformers)

Evaluation

Metrics

Semantic Similarity

Semantic Similarity

Model tree for Omartificial-Intelligence-Space/GATE-AraBert-v0

Dataset used to train Omartificial-Intelligence-Space/GATE-AraBert-v0

Evaluation results