davidkim205
/

Hunminai-1.0-12b

@@ -81,39 +81,28 @@ The model was trained on approximately 100k high-quality Korean instruction exam
 The table below contains a description of the Korean LLM evaluation benchmark dataset used for the model evaluation. More information on the benchmarks is available at [Blog](https://davidkim205.github.io/).
-| Benchmark                                                                          | Description                                                                                                                                                                                           | Abbreviation |
-|------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
-| [ko-bench](https://huggingface.co/datasets/davidkim205/ko-bench)                   | Korean-translated dataset of [MT-Bench](https://github.com/lm-sys/FastChat/blob/main/fastchat/llm_judge/data/mt_bench/question.jsonl) questions                                                       | bench        |
-| [ko-bench-v2](https://huggingface.co/datasets/davidkim205/ko-bench-v2)             | Dataset including new questions and answers following the ko-bench format                                                                                                                             | bench2       |
-| [ko-ged](https://huggingface.co/datasets/davidkim205/ko-ged)                       | Korean GED (elementary, middle, high school) open-ended question dataset<br/>Subjects: Korean, English, Mathematics, Science, Social Studies                                                          | ged          |
-| [ko-ged2](https://huggingface.co/datasets/davidkim205/ko-ged2)                     | Korean GED open-ended question dataset for the 2025 1st Korean GED Exam, covering all subjects                                                                                                        | ged2         |
-| [tiny-eval](https://huggingface.co/datasets/davidkim205/tiny-eval)                 | High-quality evaluation dataset designed to assess overall model performance with a small amount of data                                                                                              | tiny         |
-| [ko-ifeval](https://huggingface.co/datasets/davidkim205/ko-ifeval)                 | Instruction-following evaluation dataset translated from [IFEval](https://github.com/google-research/google-research/tree/master/instruction_following_eval), adapted for Korean language and culture | ifeval       |
-| [ko-ged-elementary](https://huggingface.co/datasets/davidkim205/ko-ged-elementary) | Korean elementary school GED multiple-choice question dataset                                                                                                                                         | ged\:E       |
-| [ko-ged-middle](https://huggingface.co/datasets/davidkim205/ko-ged-middle)         | Korean middle school GED multiple-choice question dataset                                                                                                                                             | ged\:M       |
-| [ko-ged-high](https://huggingface.co/datasets/davidkim205/ko-ged-high)             | Korean high school GED multiple-choice question dataset                                                                                                                                               | ged\:H       |
-| [ko-ged2-elementary](https://huggingface.co/datasets/davidkim205/ko-ged2-middle)   | Korean elementary school GED multiple-choice dataset, updated for the 2025 GED Exam                                                                                                                   | ged2\:E      |
-| [ko-ged2-middle](https://huggingface.co/datasets/davidkim205/ko-ged2-elementary)   | Korean middle school GED multiple-choice dataset, updated for the 2025 GED Exam                                                                                                                       | ged2\:M      |
-| [ko-ged2-high](https://huggingface.co/datasets/davidkim205/ko-ged2-high)           | Korean high school GED multiple-choice dataset, updated for the 2025 GED Exam                                                                                                                         | ged2\:H      |
-| [ko-gpqa](https://huggingface.co/datasets/davidkim205/ko-gpqa)                     | Korean version of GPQA containing challenging physics questions designed to test deep understanding and logical reasoning                                                                             | gpqa         |
-| [ko-math-500](https://huggingface.co/datasets/davidkim205/ko-math-500)             | Korean-translated subset of 500 high school-level math problems from the MATH dataset, including detailed solutions with LaTeX notation                                                               | math500      |
 ### Benchmark Results
-|         |  **davidkim205<br>ko-gemma<br>-3-12b** | google<br>gemma-3<br>-12b-it |  unsloth<br>gemma-3<br>-12b-it |  K-intelligence<br>Midm-2.0<br>-Base-Instruct |  LGAI-EXAONE<br>EXAONE-3.5<br>-7.8B-Instruct |
-|---------|---------------------------------------:|-----------------------------:|-------------------------------:|----------------------------------------------:|---------------------------------------------:|
-| Avg.    |                               **8.26** |                         8.22 |                           8.20 |                                          8.12 |                                         7.85 |
-| bench   |                                   7.96 |                         8.00 |                           7.83 |                                      **8.01** |                                         7.70 |
-| bench2  |                                   8.39 |                         8.23 |                       **8.44** |                                          8.21 |                                         8.01 |
-| ged     |                                   8.65 |                         8.61 |                       **8.73** |                                          8.10 |                                         8.25 |
-| ged2    |                                   8.17 |                         8.17 |                           8.31 |                                      **8.84** |                                         8.06 |
-| tiny    |                               **8.33** |                     **8.33** |                           7.88 |                                          8.25 |                                         8.12 |
-| ifeval  |                               **8.37** |                         8.30 |                           8.33 |                                          8.24 |                                         6.76 |
-| ged:E   |                               **9.72** |                     **9.72** |                           9.51 |                                      **9.72** |                                         9.65 |
-| ged:M   |                               **9.63** |                         9.55 |                           9.39 |                                          9.31 |                                         9.10 |
-| ged:H   |                                   9.32 |                         9.36 |                           9.24 |                                      **9.48** |                                         9.00 |
-| ged2:E  |                                   9.60 |                         9.60 |                       **9.66** |                                          9.60 |                                         9.48 |
-| ged2:M  |                                   9.37 |                     **9.54** |                       **9.54** |                                          9.16 |                                         8.95 |
-| ged2:H  |                               **9.32** |                         9.24 |                           9.24 |                                          9.28 |                                         8.84 |
-| gpqa    |                               **3.18** |                         2.88 |                           2.98 |                                          2.68 |                                         3.13 |
-| math500 |                                   5.60 |                         5.58 |                       **5.70** |                                          4.80 |                                         4.88 |

 The table below contains a description of the Korean LLM evaluation benchmark dataset used for the model evaluation. More information on the benchmarks is available at [Blog](https://davidkim205.github.io/).
+| Benchmark                                                                                | Description                                                                                                                                                                                           | Abbreviation |
+|------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
+| [ko-bench](https://huggingface.co/datasets/davidkim205/ko-bench)                         | Korean-translated dataset of [MT-Bench](https://github.com/lm-sys/FastChat/blob/main/fastchat/llm_judge/data/mt_bench/question.jsonl) questions                                                       | bench        |
+| [ko-ged](https://huggingface.co/datasets/davidkim205/ko-ged)                             | Korean GED (elementary, middle, high school) open-ended question dataset<br/>Subjects: Korean, English, Mathematics, Science, Social Studies                                                          | ged          |
+| [ko-ifeval](https://huggingface.co/datasets/davidkim205/ko-ifeval)                       | Instruction-following evaluation dataset translated from [IFEval](https://github.com/google-research/google-research/tree/master/instruction_following_eval), adapted for Korean language and culture | ifeval       |
+| [ko-ged-mc-elementary](https://huggingface.co/datasets/davidkim205/ko-ged-mc-elementary) | Korean elementary school GED multiple-choice question dataset                                                                                                                                         | ged\:E       |
+| [ko-ged-mc-middle](https://huggingface.co/datasets/davidkim205/ko-ged-mc-middle)         | Korean middle school GED multiple-choice question dataset                                                                                                                                             | ged\:M       |
+| [ko-ged-mc-high](https://huggingface.co/datasets/davidkim205/ko-ged-mc-high)             | Korean high school GED multiple-choice question dataset                                                                                                                                               | ged\:H       |
+| [ko-gpqa](https://huggingface.co/datasets/davidkim205/ko-gpqa)                           | Korean version of GPQA containing challenging physics questions designed to test deep understanding and logical reasoning                                                                             | gpqa         |
+| [ko-math-500](https://huggingface.co/datasets/davidkim205/ko-math-500)                   | Korean-translated subset of 500 high school-level math problems from the MATH dataset, including detailed solutions with LaTeX notation                                                               | math500      |
 ### Benchmark Results
+|         | **davidkim205<br>Hunminai<br>-1.0-12b** | google<br>gemma-3<br>-12b-it | unsloth<br>gemma-3<br>-12b-it | K-intelligence<br>Midm-2.0<br>-Base-Instruct | LGAI-EXAONE<br>EXAONE-3.5<br>-7.8B-Instruct |
+|---------|----------------------------------------:|-----------------------------:|------------------------------:|---------------------------------------------:|--------------------------------------------:|
+| Avg.    |                                **7.80** |                         7.75 |                          7.71 |                                         7.54 |                                        7.31 |
+| bench   |                                    7.96 |                         8.00 |                          7.83 |                                     **8.01** |                                        7.70 |
+| ged     |                                    8.65 |                         8.61 |                      **8.73** |                                         8.10 |                                        8.25 |
+| ged:E   |                                **9.72** |                     **9.72** |                          9.51 |                                     **9.72** |                                        9.65 |
+| ged:M   |                                **9.63** |                         9.55 |                          9.39 |                                         9.31 |                                        9.10 |
+| ged:H   |                                    9.32 |                         9.36 |                          9.24 |                                     **9.48** |                                        9.00 |
+| gpqa    |                                **3.18** |                         2.88 |                          2.98 |                                         2.68 |                                        3.13 |
+| math500 |                                    5.60 |                         5.58 |                      **5.70** |                                         4.80 |                                        4.88 |
+| ifeval  |                                **8.37** |                         8.30 |                          8.33 |                                         8.24 |                                        6.76 |