Update README.md (#2)
Browse files- Update README.md (fc7169c6a560797f9608c2d4b82e738c00d8ca0a)
Co-authored-by: Hailey LIM <[email protected]>
README.md
CHANGED
@@ -81,39 +81,28 @@ The model was trained on approximately 100k high-quality Korean instruction exam
|
|
81 |
|
82 |
The table below contains a description of the Korean LLM evaluation benchmark dataset used for the model evaluation. More information on the benchmarks is available at [Blog](https://davidkim205.github.io/).
|
83 |
|
84 |
-
| Benchmark
|
85 |
-
|
86 |
-
| [ko-bench](https://huggingface.co/datasets/davidkim205/ko-bench)
|
87 |
-
| [ko-
|
88 |
-
| [ko-
|
89 |
-
| [ko-
|
90 |
-
| [
|
91 |
-
| [ko-
|
92 |
-
| [ko-
|
93 |
-
| [ko-
|
94 |
-
| [ko-ged-high](https://huggingface.co/datasets/davidkim205/ko-ged-high) | Korean high school GED multiple-choice question dataset | ged\:H |
|
95 |
-
| [ko-ged2-elementary](https://huggingface.co/datasets/davidkim205/ko-ged2-middle) | Korean elementary school GED multiple-choice dataset, updated for the 2025 GED Exam | ged2\:E |
|
96 |
-
| [ko-ged2-middle](https://huggingface.co/datasets/davidkim205/ko-ged2-elementary) | Korean middle school GED multiple-choice dataset, updated for the 2025 GED Exam | ged2\:M |
|
97 |
-
| [ko-ged2-high](https://huggingface.co/datasets/davidkim205/ko-ged2-high) | Korean high school GED multiple-choice dataset, updated for the 2025 GED Exam | ged2\:H |
|
98 |
-
| [ko-gpqa](https://huggingface.co/datasets/davidkim205/ko-gpqa) | Korean version of GPQA containing challenging physics questions designed to test deep understanding and logical reasoning | gpqa |
|
99 |
-
| [ko-math-500](https://huggingface.co/datasets/davidkim205/ko-math-500) | Korean-translated subset of 500 high school-level math problems from the MATH dataset, including detailed solutions with LaTeX notation | math500 |
|
100 |
|
101 |
### Benchmark Results
|
102 |
|
103 |
-
| |
|
104 |
-
|
105 |
-
| Avg. |
|
106 |
-
| bench |
|
107 |
-
|
|
108 |
-
| ged
|
109 |
-
|
|
110 |
-
|
|
111 |
-
|
|
112 |
-
|
|
113 |
-
|
|
114 |
-
|
115 |
-
| ged2:E | 9.60 | 9.60 | **9.66** | 9.60 | 9.48 |
|
116 |
-
| ged2:M | 9.37 | **9.54** | **9.54** | 9.16 | 8.95 |
|
117 |
-
| ged2:H | **9.32** | 9.24 | 9.24 | 9.28 | 8.84 |
|
118 |
-
| gpqa | **3.18** | 2.88 | 2.98 | 2.68 | 3.13 |
|
119 |
-
| math500 | 5.60 | 5.58 | **5.70** | 4.80 | 4.88 |
|
|
|
81 |
|
82 |
The table below contains a description of the Korean LLM evaluation benchmark dataset used for the model evaluation. More information on the benchmarks is available at [Blog](https://davidkim205.github.io/).
|
83 |
|
84 |
+
| Benchmark | Description | Abbreviation |
|
85 |
+
|------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
|
86 |
+
| [ko-bench](https://huggingface.co/datasets/davidkim205/ko-bench) | Korean-translated dataset of [MT-Bench](https://github.com/lm-sys/FastChat/blob/main/fastchat/llm_judge/data/mt_bench/question.jsonl) questions | bench |
|
87 |
+
| [ko-ged](https://huggingface.co/datasets/davidkim205/ko-ged) | Korean GED (elementary, middle, high school) open-ended question dataset<br/>Subjects: Korean, English, Mathematics, Science, Social Studies | ged |
|
88 |
+
| [ko-ifeval](https://huggingface.co/datasets/davidkim205/ko-ifeval) | Instruction-following evaluation dataset translated from [IFEval](https://github.com/google-research/google-research/tree/master/instruction_following_eval), adapted for Korean language and culture | ifeval |
|
89 |
+
| [ko-ged-mc-elementary](https://huggingface.co/datasets/davidkim205/ko-ged-mc-elementary) | Korean elementary school GED multiple-choice question dataset | ged\:E |
|
90 |
+
| [ko-ged-mc-middle](https://huggingface.co/datasets/davidkim205/ko-ged-mc-middle) | Korean middle school GED multiple-choice question dataset | ged\:M |
|
91 |
+
| [ko-ged-mc-high](https://huggingface.co/datasets/davidkim205/ko-ged-mc-high) | Korean high school GED multiple-choice question dataset | ged\:H |
|
92 |
+
| [ko-gpqa](https://huggingface.co/datasets/davidkim205/ko-gpqa) | Korean version of GPQA containing challenging physics questions designed to test deep understanding and logical reasoning | gpqa |
|
93 |
+
| [ko-math-500](https://huggingface.co/datasets/davidkim205/ko-math-500) | Korean-translated subset of 500 high school-level math problems from the MATH dataset, including detailed solutions with LaTeX notation | math500 |
|
|
|
|
|
|
|
|
|
|
|
|
|
94 |
|
95 |
### Benchmark Results
|
96 |
|
97 |
+
| | **davidkim205<br>Hunminai<br>-1.0-12b** | google<br>gemma-3<br>-12b-it | unsloth<br>gemma-3<br>-12b-it | K-intelligence<br>Midm-2.0<br>-Base-Instruct | LGAI-EXAONE<br>EXAONE-3.5<br>-7.8B-Instruct |
|
98 |
+
|---------|----------------------------------------:|-----------------------------:|------------------------------:|---------------------------------------------:|--------------------------------------------:|
|
99 |
+
| Avg. | **7.80** | 7.75 | 7.71 | 7.54 | 7.31 |
|
100 |
+
| bench | 7.96 | 8.00 | 7.83 | **8.01** | 7.70 |
|
101 |
+
| ged | 8.65 | 8.61 | **8.73** | 8.10 | 8.25 |
|
102 |
+
| ged:E | **9.72** | **9.72** | 9.51 | **9.72** | 9.65 |
|
103 |
+
| ged:M | **9.63** | 9.55 | 9.39 | 9.31 | 9.10 |
|
104 |
+
| ged:H | 9.32 | 9.36 | 9.24 | **9.48** | 9.00 |
|
105 |
+
| gpqa | **3.18** | 2.88 | 2.98 | 2.68 | 3.13 |
|
106 |
+
| math500 | 5.60 | 5.58 | **5.70** | 4.80 | 4.88 |
|
107 |
+
| ifeval | **8.37** | 8.30 | 8.33 | 8.24 | 6.76 |
|
108 |
+
|
|
|
|
|
|
|
|
|
|