=======

Mizan-Rerank-v1

A revolutionary open-source model for reranking Arabic long texts with exceptional efficiency and accuracy.

Overview

Mizan-Rerank-v1 is a leading open-source model based on the Transformer architecture, specifically designed for reranking search results in Arabic texts. With only 149 million parameters, it offers a perfect balance between performance and efficiency, outperforming larger models while using significantly fewer resources.

Key Features

Lightweight & Efficient: 149M parameters vs competitors with 278-568M parameters
Long Text Processing: Handles up to 8192 tokens with sliding window technique
High-Speed Inference: 3x faster than comparable models
Arabic Language Optimization: Specifically fine-tuned for Arabic language nuances
Resource Efficient: 75% less memory consumption than competitors

Performance Benchmarks

Hardware Performance (RTX 4090 24GB)

Model	RAM Usage	Response Time
Mizan-Rerank-v1	1 GB	0.1 seconds
bg-rerank-v2-m3	4 GB	0.3 seconds
jina-reranker-v2-base-multilingual	2.5 GB	0.2 seconds

MIRACL Dataset Results (ndcg@10)

Model	Score
Mizan-Rerank-v1	0.8865
bge-reranker-v2-m3	0.8863
jina-reranker-v2-base-multilingual	0.8481
Namaa-ARA-Reranker-V1	0.7941
Namaa-Reranker-v1	0.7176
ms-marco-MiniLM-L12-v2	0.1750

Reranking and Triplet Datasets (ndcg@10)

Model	Reranking Dataset	Triplet Dataset
Mizan-Rerank-v1	1.0000	1.0000
bge-reranker-v2-m3	1.0000	0.9998
jina-reranker-v2-base-multilingual	1.0000	1.0000
Namaa-ARA-Reranker-V1	1.0000	0.9989
Namaa-Reranker-v1	1.0000	0.9994
ms-marco-MiniLM-L12-v2	0.8906	0.9087

Training Methodology

Mizan-Rerank-v1 was trained on a diverse corpus of 741,159,981 tokens from:

Authentic Arabic open-source datasets
Manually crafted and processed text
Purpose-generated synthetic data

This comprehensive training approach enables deep understanding of Arabic linguistic contexts.

How It Works

Query reception: The model receives a user query and candidate texts
Content analysis: Analyzes semantic relationships between query and each text
Relevance scoring: Assigns a relevance score to each text
Reranking: Sorts results by descending relevance score

Usage Examples

from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("ALJIACHI/Mizan-Rerank-v1")
tokenizer = AutoTokenizer.from_pretrained("ALJIACHI/Mizan-Rerank-v1")

# Function to calculate relevance score
def get_relevance_score(query, passage):
    inputs = tokenizer(query, passage, return_tensors="pt", padding=True, truncation=True, max_length=8192)
    outputs = model(**inputs)
    return outputs.logits.item()

# Example usage
query = "ما هو تفسير الآية وجعلنا من الماء كل شيء حي"
passages = [
    "تعني الآية أن الماء هو عنصر أساسي في حياة جميع الكائنات الحية، وهو ضروري لاستمرار الحياة.",
    "تم اكتشاف كواكب خارج المجموعة الشمسية تحتوي على مياه متجمدة.",
    "تحدث القرآن الكريم عن البرق والرعد في عدة مواضع مختلفة."
]

# Get scores for each passage
scores = [(passage, get_relevance_score(query, passage)) for passage in passages]

# Rerank passages
reranked_passages = sorted(scores, key=lambda x: x[1], reverse=True)

# Print results
for passage, score in reranked_passages:
    print(f"Score: {score:.4f} | Passage: {passage}")

Practical Examples

Example 1

السؤال: ما هو القانون الجديد بشأن الضرائب في 2024؟

النص	الدرجة
نشرت الجريدة الرسمية قانوناً جديداً في 2024 ينص على زيادة الضرائب على الشركات الكبرى بنسبة 5%	0.9989
الضرائب تعد مصدراً مهماً للدخل القومي وتختلف نسبتها من دولة إلى أخرى.	0.0001
افتتحت الحكومة مشروعاً جديداً للطاقة المتجددة في 2024.	0.0001

Example 2

السؤال: ما هو تفسير الآية وجعلنا من الماء كل شيء حي

النص	الدرجة
تعني الآية أن الماء هو عنصر أساسي في حياة جميع الكائنات الحية، وهو ضروري لاستمرار الحياة.	0.9996
تم اكتشاف كواكب خارج المجموعة الشمسية تحتوي على مياه متجمدة.	0.0000
تحدث القرآن الكريم عن البرق والرعد في عدة مواضع مختلفة.	0.0000

Example 3

السؤال: ما هي فوائد فيتامين د؟

النص	الدرجة
يساعد فيتامين د في تعزيز صحة العظام وتقوية الجهاز المناعي، كما يلعب دوراً مهماً في امتصاص الكالسيوم.	0.9991
يستخدم فيتامين د في بعض الصناعات الغذائية كمادة حافظة.	0.9941
يمكن الحصول على فيتامين د من خلال التعرض لأشعة الشمس أو تناول مكملات غذائية.	0.9938

Applications

Mizan-Rerank-v1 opens new horizons for Arabic NLP applications:

Specialized Arabic search engines
Archiving systems and digital libraries
Conversational AI applications
E-learning platforms
Information retrieval systems

Citation

If you use Mizan-Rerank-v1 in your research, please cite:

@software{Mizan_Rerank_v1_2025,
  author = {Ali Aljiachi},
  title = {Mizan-Rerank-v1: A Revolutionary Arabic Text Reranking Model},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/Aljiachi/Mizan-Rerank-v1}
}

@misc{modernbert,
      title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference}, 
      author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
      year={2024},
      eprint={2412.13663},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.13663}, 
}

License

We release the Mizan-Rerank model model weights under the Apache 2.0 license.

ALJIACHI
/

Mizan-Rerank-v1