Sean Cho commited on
Commit
b3e7847
1 Parent(s): 4c0ff9d

text style

Browse files
Files changed (1) hide show
  1. src/assets/text_content.py +6 -6
src/assets/text_content.py CHANGED
@@ -10,7 +10,7 @@ The data used for evaluation consists of datasets to assess expertise, inference
10
  The evaluation dataset is exclusively private and only available for evaluation process.
11
  More detailed information about the benchmark dataset is provided on the “About” page.
12
 
13
- This leaderboard is co-hosted by Upstage and NIA, and operated by Upstage.
14
  """
15
 
16
  LLM_BENCHMARKS_TEXT = f"""
@@ -31,13 +31,13 @@ Please provide information about the model through an issue! 🤩
31
  ## How it works
32
 
33
  📈 We have set up a benchmark using datasets translated into Korean from the four tasks (HellaSwag, MMLU, Arc, Truthful QA) operated by HuggingFace OpenLLM.
34
- - Ko-HellaSwag (provided by Upstage)
35
- - Ko-MMLU (provided by Upstage)
36
- - Ko-Arc (provided by Upstage)
37
- - Ko-Truthful QA (provided by Upstage)
38
  To provide an evaluation befitting the LLM era, we've selected benchmark datasets suitable for assessing four elements: expertise, inference, hallucination, and common sense. The final score is converted to the average score from the four evaluation datasets.
39
 
40
- GPUs are provided by KT for the evaluations.
41
 
42
  ## Details and Logs
43
  - Detailed numerical results in the `results` Upstage dataset: https://huggingface.co/datasets/open-ko-llm-leaderboard/results
 
10
  The evaluation dataset is exclusively private and only available for evaluation process.
11
  More detailed information about the benchmark dataset is provided on the “About” page.
12
 
13
+ This leaderboard is co-hosted by __Upstage__ and __NIA__, and operated by __Upstage__.
14
  """
15
 
16
  LLM_BENCHMARKS_TEXT = f"""
 
31
  ## How it works
32
 
33
  📈 We have set up a benchmark using datasets translated into Korean from the four tasks (HellaSwag, MMLU, Arc, Truthful QA) operated by HuggingFace OpenLLM.
34
+ - Ko-HellaSwag (provided by __Upstage__)
35
+ - Ko-MMLU (provided by __Upstage__)
36
+ - Ko-Arc (provided by __Upstage__)
37
+ - Ko-Truthful QA (provided by __Upstage__)
38
  To provide an evaluation befitting the LLM era, we've selected benchmark datasets suitable for assessing four elements: expertise, inference, hallucination, and common sense. The final score is converted to the average score from the four evaluation datasets.
39
 
40
+ GPUs are provided by __KT__ for the evaluations.
41
 
42
  ## Details and Logs
43
  - Detailed numerical results in the `results` Upstage dataset: https://huggingface.co/datasets/open-ko-llm-leaderboard/results