davidkim205
/

Hunminai-1.0-27b

@@ -81,42 +81,29 @@ The model was trained on approximately 100k high-quality Korean instruction exam
 The table below contains a description of the Korean LLM evaluation benchmark dataset used for the model evaluation. More information on the benchmarks is available at [Blog](https://davidkim205.github.io/).
-| Benchmark                                                                          | Description                                                                                                                                                                                           | Abbreviation |
-|------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
-| [ko-bench](https://huggingface.co/datasets/davidkim205/ko-bench)                   | Korean-translated dataset of [MT-Bench](https://github.com/lm-sys/FastChat/blob/main/fastchat/llm_judge/data/mt_bench/question.jsonl) questions                                                       | bench        |
-| [ko-bench-v2](https://huggingface.co/datasets/davidkim205/ko-bench-v2)             | Dataset including new questions and answers following the ko-bench format                                                                                                                             | bench2       |
-| [ko-ged](https://huggingface.co/datasets/davidkim205/ko-ged)                       | Korean GED (elementary, middle, high school) open-ended question dataset<br/>Subjects: Korean, English, Mathematics, Science, Social Studies                                                          | ged          |
-| [ko-ged2](https://huggingface.co/datasets/davidkim205/ko-ged2)                     | Korean GED open-ended question dataset for the 2025 1st Korean GED Exam, covering all subjects                                                                                                        | ged2         |
-| [tiny-eval](https://huggingface.co/datasets/davidkim205/tiny-eval)                 | High-quality evaluation dataset designed to assess overall model performance with a small amount of data                                                                                              | tiny         |
-| [ko-ifeval](https://huggingface.co/datasets/davidkim205/ko-ifeval)                 | Instruction-following evaluation dataset translated from [IFEval](https://github.com/google-research/google-research/tree/master/instruction_following_eval), adapted for Korean language and culture | ifeval       |
-| [ko-ged-elementary](https://huggingface.co/datasets/davidkim205/ko-ged-elementary) | Korean elementary school GED multiple-choice question dataset                                                                                                                                         | ged\:E       |
-| [ko-ged-middle](https://huggingface.co/datasets/davidkim205/ko-ged-middle)         | Korean middle school GED multiple-choice question dataset                                                                                                                                             | ged\:M       |
-| [ko-ged-high](https://huggingface.co/datasets/davidkim205/ko-ged-high)             | Korean high school GED multiple-choice question dataset                                                                                                                                               | ged\:H       |
-| [ko-ged2-elementary](https://huggingface.co/datasets/davidkim205/ko-ged2-middle)   | Korean elementary school GED multiple-choice dataset, updated for the 2025 GED Exam                                                                                                                   | ged2\:E      |
-| [ko-ged2-middle](https://huggingface.co/datasets/davidkim205/ko-ged2-elementary)   | Korean middle school GED multiple-choice dataset, updated for the 2025 GED Exam                                                                                                                       | ged2\:M      |
-| [ko-ged2-high](https://huggingface.co/datasets/davidkim205/ko-ged2-high)           | Korean high school GED multiple-choice dataset, updated for the 2025 GED Exam                                                                                                                         | ged2\:H      |
-| [ko-gpqa](https://huggingface.co/datasets/davidkim205/ko-gpqa)                     | Korean version of GPQA containing challenging physics questions designed to test deep understanding and logical reasoning                                                                             | gpqa         |
-| [ko-math-500](https://huggingface.co/datasets/davidkim205/ko-math-500)             | Korean-translated subset of 500 high school-level math problems from the MATH dataset, including detailed solutions with LaTeX notation                                                               | math500      |
 ### Benchmark Results
-|         |  **davidkim205<br>ko-gemma<br/>-3-27b** |  google<br>gemma-3<br>-27b-it |  unsloth<br>gemma-3<br>-27b-it |  google<br>gemma-2<br>-27b-it |
-|---------|----------------------------------------:|------------------------------:|-------------------------------:|------------------------------:|
-| Avg.    |                                **8.83** |                          8.74 |                           8.56 |                          8.08 |
-| bench   |                                    8.26 |                          8.06 |                       **8.27** |                          7.59 |
-| bench2  |                                    8.74 |                      **8.79** |                           8.73 |                          8.21 |
-| ged     |                                **9.19** |                          9.02 |                           9.03 |                          8.38 |
-| ged2    |                                    8.90 |                      **8.91** |                           8.98 |                          8.38 |
-| tiny    |                                    8.50 |                          9.08 |                       **9.12** |                               |
-| ifeval  |                                         |                               |                       **8.10** |                               |
-| ged:E   |                                    9.86 |                          9.86 |                       **9.93** |                          9.51 |
-| ged:M   |                                    9.67 |                          9.63 |                       **9.76** |                          9.10 |
-| ged:H   |                                **9.60** |                          9.52 |                           9.52 |                          9.32 |
-| ged2:E  |                                    9.77 |                          9.89 |                       **9.94** |                          9.48 |
-| ged2:M  |                                **9.75** |                          9.58 |                           9.46 |                          9.33 |
-| ged2:H  |                                **9.48** |                          9.23 |                           9.40 |                          9.08 |
-| gpqa    |                                **4.55** |                          3.69 |                           3.38 |                          3.54 |
-| math500 |                                **8.56** |                          8.38 |                           6.26 |                          5.00 |

 The table below contains a description of the Korean LLM evaluation benchmark dataset used for the model evaluation. More information on the benchmarks is available at [Blog](https://davidkim205.github.io/).
+| Benchmark                                                                                | Description                                                                                                                                                                                           | Abbreviation |
+|------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
+| [ko-bench](https://huggingface.co/datasets/davidkim205/ko-bench)                         | Korean-translated dataset of [MT-Bench](https://github.com/lm-sys/FastChat/blob/main/fastchat/llm_judge/data/mt_bench/question.jsonl) questions                                                       | bench        |
+| [ko-ged](https://huggingface.co/datasets/davidkim205/ko-ged)                             | Korean GED (elementary, middle, high school) open-ended question dataset<br/>Subjects: Korean, English, Mathematics, Science, Social Studies                                                          | ged          |
+| [ko-ged-mc-elementary](https://huggingface.co/datasets/davidkim205/ko-ged-mc-elementary) | Korean elementary school GED multiple-choice question dataset                                                                                                                                         | ged\:E       |
+| [ko-ged-mc-middle](https://huggingface.co/datasets/davidkim205/ko-ged-mc-middle)         | Korean middle school GED multiple-choice question dataset                                                                                                                                             | ged\:M       |
+| [ko-ged-mc-high](https://huggingface.co/datasets/davidkim205/ko-ged-mc-high)             | Korean high school GED multiple-choice question dataset                                                                                                                                               | ged\:H       |
+| [ko-gpqa](https://huggingface.co/datasets/davidkim205/ko-gpqa)                           | Korean version of GPQA containing challenging physics questions designed to test deep understanding and logical reasoning                                                                             | gpqa         |
+| [ko-math-500](https://huggingface.co/datasets/davidkim205/ko-math-500)                   | Korean-translated subset of 500 high school-level math problems from the MATH dataset, including detailed solutions with LaTeX notation                                                               | math500      |
+| [ko-ifeval](https://huggingface.co/datasets/davidkim205/ko-ifeval)                       | Instruction-following evaluation dataset translated from [IFEval](https://github.com/google-research/google-research/tree/master/instruction_following_eval), adapted for Korean language and culture | ifeval       |
 ### Benchmark Results
+|         | **davidkim205<br>Hunminai<br>-1.0-27b** | google<br>gemma-3<br>-27b-it | unsloth<br>gemma-3<br>-27b-it | google<br>gemma-2<br>-27b-it |
+|---------|----------------------------------------:|-----------------------------:|------------------------------:|-----------------------------:|
+| Avg.    |                                **8.53** |                         8.31 |                          8.03 |                         7.49 |
+| bench   |                                    8.26 |                         8.06 |                      **8.27** |                         7.59 |
+| ged     |                                **9.19** |                         9.02 |                          9.03 |                         8.38 |
+| ged:E   |                                    9.86 |                         9.86 |                      **9.93** |                         9.51 |
+| ged:M   |                                    9.67 |                         9.63 |                      **9.76** |                         9.10 |
+| ged:H   |                                **9.60** |                         9.52 |                          9.52 |                         9.32 |
+| gpqa    |                                **4.55** |                         3.69 |                          3.38 |                         3.54 |
+| math500 |                                **8.56** |                         8.38 |                          6.26 |                         5.00 |
+| ifeval  |                                         |                              |                      **8.10** |                              |