MCILAB commited on
Commit
0ed4b45
·
verified ·
1 Parent(s): ad600ef

Update src/about.py

Browse files
Files changed (1) hide show
  1. src/about.py +5 -4
src/about.py CHANGED
@@ -27,7 +27,7 @@ TITLE = """<h1 align="center" id="space-title">Open Persian LLM Alignment Leader
27
 
28
  # What does your leaderboard evaluate?
29
  INTRODUCTION_TEXT = """
30
- Intro text
31
  """
32
 
33
  # Which evaluations are you running? how can people reproduce what you have?
@@ -39,19 +39,20 @@ Addressing the gaps in existing LLM evaluation frameworks, this benchmark is spe
39
  1. Translated datasets (adapted from established English benchmarks)
40
  2. Synthetically generated data (newly created for Persian LLMs)
41
  3. Naturally collected data (reflecting indigenous cultural nuances)
42
- Key Datasets in the Benchmark
 
43
  The benchmark integrates the following datasets to ensure a robust evaluation of Persian LLMs:
44
  Translated Datasets
45
  • Anthropic-fa
46
  • AdvBench-fa
47
  • HarmBench-fa
48
  • DecodingTrust-fa
49
- Newly Developed Persian Datasets
50
  • ProhibiBench-fa: Evaluates harmful and prohibited content in Persian culture.
51
  • SafeBench-fa: Assesses safety in generated outputs.
52
  • FairBench-fa: Measures bias mitigation in Persian LLMs.
53
  • SocialBench-fa: Evaluates adherence to culturally accepted behaviors.
54
- Naturally Collected Persian Dataset
55
  • GuardBench-fa: A large-scale dataset designed to align Persian LLMs with local cultural norms.
56
  A Unified Framework for Persian LLM Evaluation
57
  By combining these datasets, our work establishes a culturally grounded alignment evaluation framework, enabling systematic assessment across three key aspects:
 
27
 
28
  # What does your leaderboard evaluate?
29
  INTRODUCTION_TEXT = """
30
+ Open Persian LLM Alignment Leaderboard
31
  """
32
 
33
  # Which evaluations are you running? how can people reproduce what you have?
 
39
  1. Translated datasets (adapted from established English benchmarks)
40
  2. Synthetically generated data (newly created for Persian LLMs)
41
  3. Naturally collected data (reflecting indigenous cultural nuances)
42
+
43
+ ### Key Datasets in the Benchmark
44
  The benchmark integrates the following datasets to ensure a robust evaluation of Persian LLMs:
45
  Translated Datasets
46
  • Anthropic-fa
47
  • AdvBench-fa
48
  • HarmBench-fa
49
  • DecodingTrust-fa
50
+ ### Newly Developed Persian Datasets
51
  • ProhibiBench-fa: Evaluates harmful and prohibited content in Persian culture.
52
  • SafeBench-fa: Assesses safety in generated outputs.
53
  • FairBench-fa: Measures bias mitigation in Persian LLMs.
54
  • SocialBench-fa: Evaluates adherence to culturally accepted behaviors.
55
+ ### Naturally Collected Persian Dataset
56
  • GuardBench-fa: A large-scale dataset designed to align Persian LLMs with local cultural norms.
57
  A Unified Framework for Persian LLM Evaluation
58
  By combining these datasets, our work establishes a culturally grounded alignment evaluation framework, enabling systematic assessment across three key aspects: