Modified introductions
Browse files
utils.py
CHANGED
@@ -28,26 +28,27 @@ COLUMN_NAMES = MODEL_INFO
|
|
28 |
LEADERBOARD_INTRODUCTION = """# Chumor Leaderboard
|
29 |
|
30 |
## Introduction
|
31 |
-
We
|
32 |
|
33 |
-
Note: For inclusion in our leaderboard, submissions must provide substantial evidence demonstrating that their system is a genuine language model. We maintain strict verification standards to ensure the integrity and comparability of the results.
|
34 |
|
35 |
## What's new about MMLU-Pro
|
36 |
|
37 |
Compared to the original MMLU, there are three major differences:
|
38 |
|
39 |
-
|
40 |
-
- The original MMLU dataset contains mostly knowledge-driven questions without requiring much reasoning. Therefore, PPL results are normally better than CoT. In our dataset, we increase the problem difficulty and integrate more reasoning-focused problems. In MMLU-Pro, CoT can be 20% higher than PPL.
|
41 |
-
- By increasing the distractor numbers, we significantly reduce the probability of correct guess by chance to boost the benchmark’s robustness. Specifically, with 24 different prompt styles tested, the sensitivity of model scores to prompt variations decreased from 4-5% in MMLU to just 2% in MMLU-Pro.
|
42 |
|
43 |
-
For detailed information about the dataset, visit our page on Hugging Face: https://huggingface.co/datasets/
|
44 |
|
45 |
-
If you are interested in replicating these results or wish to evaluate your models using our dataset, access our evaluation scripts available on GitHub: https://github.com/
|
46 |
|
47 |
-
If you would like to learn more details about our dataset, please check out our paper: https://arxiv.org/abs/2406.
|
48 |
|
49 |
Below you can find the accuracies of different models tested on this dataset.
|
50 |
|
|
|
|
|
|
|
|
|
51 |
"""
|
52 |
|
53 |
TABLE_INTRODUCTION = """
|
@@ -74,23 +75,17 @@ CITATION_BUTTON_TEXT = r"""
|
|
74 |
|
75 |
SUBMIT_INTRODUCTION = """# Submit on MMLU-Pro Leaderboard Introduction
|
76 |
|
77 |
-
## ⚠ Please note that you need to submit the
|
78 |
-
|
79 |
-
```
|
80 |
-
|
81 |
-
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
-
"answer": "ABC",
|
86 |
-
"answer_index": 1,
|
87 |
-
"category": "abc,
|
88 |
-
"pred": "B",
|
89 |
-
"model_outputs": ""
|
90 |
-
}, ...
|
91 |
-
]
|
92 |
```
|
93 |
-
|
|
|
94 |
"""
|
95 |
|
96 |
|
|
|
28 |
LEADERBOARD_INTRODUCTION = """# Chumor Leaderboard
|
29 |
|
30 |
## Introduction
|
31 |
+
We construct Chumor, the first Chinese humor explanation dataset that exceeds the size of existing humor datasets. Chumor is sourced from Ruo Zhi Ba (弱智吧), a Chinese Reddit-like platform known for sharing intellectually challenging and culturally specific jokes.
|
32 |
|
|
|
33 |
|
34 |
## What's new about MMLU-Pro
|
35 |
|
36 |
Compared to the original MMLU, there are three major differences:
|
37 |
|
38 |
+
Unlike existing datasets that focus on tasks such as humor detection, punchline identification, or humor generation, Chumor addresses the challenge of humor explanation. This involves not just identifying humor but understanding the reasoning behind it, a task that requires both linguistic and cultural knowledge. Specifically, Chumor tasks the LLMs with determining whether an explanation fully explains the joke. We source the explanations from GPT-4o and ERNIE-4-turbo, and have the entire dataset manually annotated by five native Chinese speakers.
|
|
|
|
|
39 |
|
40 |
+
For detailed information about the dataset, visit our page on Hugging Face: https://huggingface.co/datasets/dnaihao/Chumor.
|
41 |
|
42 |
+
If you are interested in replicating these results or wish to evaluate your models using our dataset, access our evaluation scripts available on GitHub: https://github.com/dnaihao/Chumor-dataset.
|
43 |
|
44 |
+
If you would like to learn more details about our dataset, please check out our paper: https://arxiv.org/abs/2406.12754.
|
45 |
|
46 |
Below you can find the accuracies of different models tested on this dataset.
|
47 |
|
48 |
+
### Acknowledgements
|
49 |
+
|
50 |
+
We construct the leaderboard based on the templated by https://huggingface.co/spaces/TIGER-Lab/MMLU-Pro.
|
51 |
+
|
52 |
"""
|
53 |
|
54 |
TABLE_INTRODUCTION = """
|
|
|
75 |
|
76 |
SUBMIT_INTRODUCTION = """# Submit on MMLU-Pro Leaderboard Introduction
|
77 |
|
78 |
+
## ⚠ Please note that you need to submit the CSV file with the following format:
|
79 |
+
|
80 |
+
```csv
|
81 |
+
labels
|
82 |
+
good
|
83 |
+
good
|
84 |
+
bad
|
85 |
+
...
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
86 |
```
|
87 |
+
|
88 |
+
You can generate an output file in the above format using the evaluation script provided in our GitHub repository. For your convenience, the script and detailed instructions are available at GitHub: https://github.com/dnaihao/Chumor-dataset. After generating the file, please send us an email at [email protected], attaching the output file.
|
89 |
"""
|
90 |
|
91 |
|