Update src/about.py
Browse files- src/about.py +5 -4
src/about.py
CHANGED
|
@@ -27,7 +27,7 @@ TITLE = """<h1 align="center" id="space-title">Open Persian LLM Alignment Leader
|
|
| 27 |
|
| 28 |
# What does your leaderboard evaluate?
|
| 29 |
INTRODUCTION_TEXT = """
|
| 30 |
-
|
| 31 |
"""
|
| 32 |
|
| 33 |
# Which evaluations are you running? how can people reproduce what you have?
|
|
@@ -39,19 +39,20 @@ Addressing the gaps in existing LLM evaluation frameworks, this benchmark is spe
|
|
| 39 |
1. Translated datasets (adapted from established English benchmarks)
|
| 40 |
2. Synthetically generated data (newly created for Persian LLMs)
|
| 41 |
3. Naturally collected data (reflecting indigenous cultural nuances)
|
| 42 |
-
|
|
|
|
| 43 |
The benchmark integrates the following datasets to ensure a robust evaluation of Persian LLMs:
|
| 44 |
Translated Datasets
|
| 45 |
• Anthropic-fa
|
| 46 |
• AdvBench-fa
|
| 47 |
• HarmBench-fa
|
| 48 |
• DecodingTrust-fa
|
| 49 |
-
Newly Developed Persian Datasets
|
| 50 |
• ProhibiBench-fa: Evaluates harmful and prohibited content in Persian culture.
|
| 51 |
• SafeBench-fa: Assesses safety in generated outputs.
|
| 52 |
• FairBench-fa: Measures bias mitigation in Persian LLMs.
|
| 53 |
• SocialBench-fa: Evaluates adherence to culturally accepted behaviors.
|
| 54 |
-
Naturally Collected Persian Dataset
|
| 55 |
• GuardBench-fa: A large-scale dataset designed to align Persian LLMs with local cultural norms.
|
| 56 |
A Unified Framework for Persian LLM Evaluation
|
| 57 |
By combining these datasets, our work establishes a culturally grounded alignment evaluation framework, enabling systematic assessment across three key aspects:
|
|
|
|
| 27 |
|
| 28 |
# What does your leaderboard evaluate?
|
| 29 |
INTRODUCTION_TEXT = """
|
| 30 |
+
Open Persian LLM Alignment Leaderboard
|
| 31 |
"""
|
| 32 |
|
| 33 |
# Which evaluations are you running? how can people reproduce what you have?
|
|
|
|
| 39 |
1. Translated datasets (adapted from established English benchmarks)
|
| 40 |
2. Synthetically generated data (newly created for Persian LLMs)
|
| 41 |
3. Naturally collected data (reflecting indigenous cultural nuances)
|
| 42 |
+
|
| 43 |
+
### Key Datasets in the Benchmark
|
| 44 |
The benchmark integrates the following datasets to ensure a robust evaluation of Persian LLMs:
|
| 45 |
Translated Datasets
|
| 46 |
• Anthropic-fa
|
| 47 |
• AdvBench-fa
|
| 48 |
• HarmBench-fa
|
| 49 |
• DecodingTrust-fa
|
| 50 |
+
### Newly Developed Persian Datasets
|
| 51 |
• ProhibiBench-fa: Evaluates harmful and prohibited content in Persian culture.
|
| 52 |
• SafeBench-fa: Assesses safety in generated outputs.
|
| 53 |
• FairBench-fa: Measures bias mitigation in Persian LLMs.
|
| 54 |
• SocialBench-fa: Evaluates adherence to culturally accepted behaviors.
|
| 55 |
+
### Naturally Collected Persian Dataset
|
| 56 |
• GuardBench-fa: A large-scale dataset designed to align Persian LLMs with local cultural norms.
|
| 57 |
A Unified Framework for Persian LLM Evaluation
|
| 58 |
By combining these datasets, our work establishes a culturally grounded alignment evaluation framework, enabling systematic assessment across three key aspects:
|