davidkim205 Doleeee commited on
Commit
9a3730f
·
verified ·
1 Parent(s): 3a235bc

Update README.md (#2)

Browse files

- Update README.md (c648f605cc184c2fa1e8580c4492f48af0a8de9f)


Co-authored-by: Hailey LIM <[email protected]>

Files changed (1) hide show
  1. README.md +21 -34
README.md CHANGED
@@ -81,42 +81,29 @@ The model was trained on approximately 100k high-quality Korean instruction exam
81
 
82
  The table below contains a description of the Korean LLM evaluation benchmark dataset used for the model evaluation. More information on the benchmarks is available at [Blog](https://davidkim205.github.io/).
83
 
84
- | Benchmark | Description | Abbreviation |
85
- |------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
86
- | [ko-bench](https://huggingface.co/datasets/davidkim205/ko-bench) | Korean-translated dataset of [MT-Bench](https://github.com/lm-sys/FastChat/blob/main/fastchat/llm_judge/data/mt_bench/question.jsonl) questions | bench |
87
- | [ko-bench-v2](https://huggingface.co/datasets/davidkim205/ko-bench-v2) | Dataset including new questions and answers following the ko-bench format | bench2 |
88
- | [ko-ged](https://huggingface.co/datasets/davidkim205/ko-ged) | Korean GED (elementary, middle, high school) open-ended question dataset<br/>Subjects: Korean, English, Mathematics, Science, Social Studies | ged |
89
- | [ko-ged2](https://huggingface.co/datasets/davidkim205/ko-ged2) | Korean GED open-ended question dataset for the 2025 1st Korean GED Exam, covering all subjects | ged2 |
90
- | [tiny-eval](https://huggingface.co/datasets/davidkim205/tiny-eval) | High-quality evaluation dataset designed to assess overall model performance with a small amount of data | tiny |
91
- | [ko-ifeval](https://huggingface.co/datasets/davidkim205/ko-ifeval) | Instruction-following evaluation dataset translated from [IFEval](https://github.com/google-research/google-research/tree/master/instruction_following_eval), adapted for Korean language and culture | ifeval |
92
- | [ko-ged-elementary](https://huggingface.co/datasets/davidkim205/ko-ged-elementary) | Korean elementary school GED multiple-choice question dataset | ged\:E |
93
- | [ko-ged-middle](https://huggingface.co/datasets/davidkim205/ko-ged-middle) | Korean middle school GED multiple-choice question dataset | ged\:M |
94
- | [ko-ged-high](https://huggingface.co/datasets/davidkim205/ko-ged-high) | Korean high school GED multiple-choice question dataset | ged\:H |
95
- | [ko-ged2-elementary](https://huggingface.co/datasets/davidkim205/ko-ged2-middle) | Korean elementary school GED multiple-choice dataset, updated for the 2025 GED Exam | ged2\:E |
96
- | [ko-ged2-middle](https://huggingface.co/datasets/davidkim205/ko-ged2-elementary) | Korean middle school GED multiple-choice dataset, updated for the 2025 GED Exam | ged2\:M |
97
- | [ko-ged2-high](https://huggingface.co/datasets/davidkim205/ko-ged2-high) | Korean high school GED multiple-choice dataset, updated for the 2025 GED Exam | ged2\:H |
98
- | [ko-gpqa](https://huggingface.co/datasets/davidkim205/ko-gpqa) | Korean version of GPQA containing challenging physics questions designed to test deep understanding and logical reasoning | gpqa |
99
- | [ko-math-500](https://huggingface.co/datasets/davidkim205/ko-math-500) | Korean-translated subset of 500 high school-level math problems from the MATH dataset, including detailed solutions with LaTeX notation | math500 |
100
 
101
  ### Benchmark Results
102
 
103
 
104
- | | **davidkim205<br>ko-gemma<br/>-3-27b** | google<br>gemma-3<br>-27b-it | unsloth<br>gemma-3<br>-27b-it | google<br>gemma-2<br>-27b-it |
105
- |---------|----------------------------------------:|------------------------------:|-------------------------------:|------------------------------:|
106
- | Avg. | **8.83** | 8.74 | 8.56 | 8.08 |
107
- | bench | 8.26 | 8.06 | **8.27** | 7.59 |
108
- | bench2 | 8.74 | **8.79** | 8.73 | 8.21 |
109
- | ged | **9.19** | 9.02 | 9.03 | 8.38 |
110
- | ged2 | 8.90 | **8.91** | 8.98 | 8.38 |
111
- | tiny | 8.50 | 9.08 | **9.12** | |
112
- | ifeval | | | **8.10** | |
113
- | ged:E | 9.86 | 9.86 | **9.93** | 9.51 |
114
- | ged:M | 9.67 | 9.63 | **9.76** | 9.10 |
115
- | ged:H | **9.60** | 9.52 | 9.52 | 9.32 |
116
- | ged2:E | 9.77 | 9.89 | **9.94** | 9.48 |
117
- | ged2:M | **9.75** | 9.58 | 9.46 | 9.33 |
118
- | ged2:H | **9.48** | 9.23 | 9.40 | 9.08 |
119
- | gpqa | **4.55** | 3.69 | 3.38 | 3.54 |
120
- | math500 | **8.56** | 8.38 | 6.26 | 5.00 |
121
-
122
 
 
81
 
82
  The table below contains a description of the Korean LLM evaluation benchmark dataset used for the model evaluation. More information on the benchmarks is available at [Blog](https://davidkim205.github.io/).
83
 
84
+ | Benchmark | Description | Abbreviation |
85
+ |------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
86
+ | [ko-bench](https://huggingface.co/datasets/davidkim205/ko-bench) | Korean-translated dataset of [MT-Bench](https://github.com/lm-sys/FastChat/blob/main/fastchat/llm_judge/data/mt_bench/question.jsonl) questions | bench |
87
+ | [ko-ged](https://huggingface.co/datasets/davidkim205/ko-ged) | Korean GED (elementary, middle, high school) open-ended question dataset<br/>Subjects: Korean, English, Mathematics, Science, Social Studies | ged |
88
+ | [ko-ged-mc-elementary](https://huggingface.co/datasets/davidkim205/ko-ged-mc-elementary) | Korean elementary school GED multiple-choice question dataset | ged\:E |
89
+ | [ko-ged-mc-middle](https://huggingface.co/datasets/davidkim205/ko-ged-mc-middle) | Korean middle school GED multiple-choice question dataset | ged\:M |
90
+ | [ko-ged-mc-high](https://huggingface.co/datasets/davidkim205/ko-ged-mc-high) | Korean high school GED multiple-choice question dataset | ged\:H |
91
+ | [ko-gpqa](https://huggingface.co/datasets/davidkim205/ko-gpqa) | Korean version of GPQA containing challenging physics questions designed to test deep understanding and logical reasoning | gpqa |
92
+ | [ko-math-500](https://huggingface.co/datasets/davidkim205/ko-math-500) | Korean-translated subset of 500 high school-level math problems from the MATH dataset, including detailed solutions with LaTeX notation | math500 |
93
+ | [ko-ifeval](https://huggingface.co/datasets/davidkim205/ko-ifeval) | Instruction-following evaluation dataset translated from [IFEval](https://github.com/google-research/google-research/tree/master/instruction_following_eval), adapted for Korean language and culture | ifeval |
 
 
 
 
 
 
94
 
95
  ### Benchmark Results
96
 
97
 
98
+ | | **davidkim205<br>Hunminai<br>-1.0-27b** | google<br>gemma-3<br>-27b-it | unsloth<br>gemma-3<br>-27b-it | google<br>gemma-2<br>-27b-it |
99
+ |---------|----------------------------------------:|-----------------------------:|------------------------------:|-----------------------------:|
100
+ | Avg. | **8.53** | 8.31 | 8.03 | 7.49 |
101
+ | bench | 8.26 | 8.06 | **8.27** | 7.59 |
102
+ | ged | **9.19** | 9.02 | 9.03 | 8.38 |
103
+ | ged:E | 9.86 | 9.86 | **9.93** | 9.51 |
104
+ | ged:M | 9.67 | 9.63 | **9.76** | 9.10 |
105
+ | ged:H | **9.60** | 9.52 | 9.52 | 9.32 |
106
+ | gpqa | **4.55** | 3.69 | 3.38 | 3.54 |
107
+ | math500 | **8.56** | 8.38 | 6.26 | 5.00 |
108
+ | ifeval | | | **8.10** | |
 
 
 
 
 
 
 
109