LLM_Alignment_Evaluation

Running

App Files Files

xet

Community

MCILAB commited on Apr 12

Commit

abf9d9d

verified ·

1 Parent(s): 0ed4b45

Update src/about.py

Browse files

Files changed (1) hide show

src/about.py +6 -2

src/about.py CHANGED Viewed

@@ -35,7 +35,8 @@ LLM_BENCHMARKS_TEXT = f"""
 ## Open Persian LLM Alignment Leaderboard
 Developed by MCILAB in collaboration with the Machine Learning Laboratory at Sharif University of Technology, this benchmark presents a comprehensive evaluation framework for assessing the alignment of Persian Large Language Models (LLMs) with critical ethical dimensions, including safety, fairness, and social norms.
-Addressing the gaps in existing LLM evaluation frameworks, this benchmark is specifically tailored to Persian linguistic and cultural contexts. It combines three types of Persian-language benchmarks:
     1. Translated datasets (adapted from established English benchmarks)
     2. Synthetically generated data (newly created for Persian LLMs)
     3. Naturally collected data (reflecting indigenous cultural nuances)
@@ -54,11 +55,14 @@ Translated Datasets
     • SocialBench-fa: Evaluates adherence to culturally accepted behaviors.
 ### Naturally Collected Persian Dataset
     • GuardBench-fa: A large-scale dataset designed to align Persian LLMs with local cultural norms.
-A Unified Framework for Persian LLM Evaluation
 By combining these datasets, our work establishes a culturally grounded alignment evaluation framework, enabling systematic assessment across three key aspects:
     • Safety: Avoiding harmful or toxic content.
     • Fairness: Mitigating biases in model outputs.
     • Social Norms: Ensuring culturally appropriate behavior.
 This benchmark not only fills a critical gap in Persian LLM evaluation but also provides a standardized leaderboard to track progress in developing aligned, ethical, and culturally aware Persian language models.
 """

 ## Open Persian LLM Alignment Leaderboard
 Developed by MCILAB in collaboration with the Machine Learning Laboratory at Sharif University of Technology, this benchmark presents a comprehensive evaluation framework for assessing the alignment of Persian Large Language Models (LLMs) with critical ethical dimensions, including safety, fairness, and social norms.
+Addressing the gaps in existing LLM evaluation frameworks, this benchmark is specifically tailored to Persian linguistic and cultural contexts.
+### It combines three types of Persian-language benchmarks:
     1. Translated datasets (adapted from established English benchmarks)
     2. Synthetically generated data (newly created for Persian LLMs)
     3. Naturally collected data (reflecting indigenous cultural nuances)
     • SocialBench-fa: Evaluates adherence to culturally accepted behaviors.
 ### Naturally Collected Persian Dataset
     • GuardBench-fa: A large-scale dataset designed to align Persian LLMs with local cultural norms.
+### A Unified Framework for Persian LLM Evaluation
 By combining these datasets, our work establishes a culturally grounded alignment evaluation framework, enabling systematic assessment across three key aspects:
     • Safety: Avoiding harmful or toxic content.
     • Fairness: Mitigating biases in model outputs.
     • Social Norms: Ensuring culturally appropriate behavior.
 This benchmark not only fills a critical gap in Persian LLM evaluation but also provides a standardized leaderboard to track progress in developing aligned, ethical, and culturally aware Persian language models.
 """