davidkim205
commited on
Commit
·
9fa844e
1
Parent(s):
a63fc36
Update README.md
Browse files
README.md
CHANGED
@@ -37,20 +37,26 @@ korean multi-task instruction dataset
|
|
37 |
- CUDA Version: 12.2
|
38 |
|
39 |
## Training
|
40 |
-
Refer github
|
41 |
|
42 |
## Evaluation
|
43 |
|
44 |
For objective model evaluation, we initially used EleutherAI's lm-evaluation-harness but obtained unsatisfactory results. Consequently, we conducted evaluations using ChatGPT, a widely used model, as described in [Self-Alignment with Instruction Backtranslation](https://arxiv.org/pdf/2308.06502.pdf) and [Three Ways of Using Large Language Models to Evaluate Chat](https://arxiv.org/pdf/2308.06259.pdf) .
|
45 |
|
46 |
-
| model
|
47 |
-
|
|
48 |
-
| gpt-3.5-turbo
|
49 |
-
|
|
50 |
-
|
|
51 |
-
|
|
52 |
-
|
|
53 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
54 |
|
55 |
------------------------------------------------
|
56 |
# Original model card: Meta's Llama 2 7B-chat
|
|
|
37 |
- CUDA Version: 12.2
|
38 |
|
39 |
## Training
|
40 |
+
Refer https://github.com/davidkim205/komt
|
41 |
|
42 |
## Evaluation
|
43 |
|
44 |
For objective model evaluation, we initially used EleutherAI's lm-evaluation-harness but obtained unsatisfactory results. Consequently, we conducted evaluations using ChatGPT, a widely used model, as described in [Self-Alignment with Instruction Backtranslation](https://arxiv.org/pdf/2308.06502.pdf) and [Three Ways of Using Large Language Models to Evaluate Chat](https://arxiv.org/pdf/2308.06259.pdf) .
|
45 |
|
46 |
+
| model | score | average(0~5) | percentage |
|
47 |
+
| --------------------------------------- | ------- | ------------ | ---------- |
|
48 |
+
| gpt-3.5-turbo(close) | 147 | 3.97 | 79.45% |
|
49 |
+
| naver Cue(close) | 140 | 3.78 | 75.67% |
|
50 |
+
| clova X(close) | 136 | 3.67 | 73.51% |
|
51 |
+
| WizardLM-13B-V1.2(open) | 96 | 2.59 | 51.89% |
|
52 |
+
| Llama-2-7b-chat-hf(open) | 67 | 1.81 | 36.21% |
|
53 |
+
| Llama-2-13b-chat-hf(open) | 73 | 1.91 | 38.37% |
|
54 |
+
| nlpai-lab/kullm-polyglot-12.8b-v2(open) | 70 | 1.89 | 37.83% |
|
55 |
+
| kfkas/Llama-2-ko-7b-Chat(open) | 96 | 2.59 | 51.89% |
|
56 |
+
| beomi/KoAlpaca-Polyglot-12.8B(open) | 100 | 2.70 | 54.05% |
|
57 |
+
| **komt-llama2-7b-v1 (open)(ours)** | **117** | **3.16** | **63.24%** |
|
58 |
+
| **komt-llama2-13b-v1 (open)(ours)** | **129** | **3.48** | **69.72%** |
|
59 |
+
|
60 |
|
61 |
------------------------------------------------
|
62 |
# Original model card: Meta's Llama 2 7B-chat
|