davidkim205
/

komt-llama2-13b-v1-ggml

Text Generation

Model card Files Files and versions Community

davidkim205 commited on Sep 27, 2023

Commit

9fa844e

·

1 Parent(s): a63fc36

Update README.md

Files changed (1) hide show

README.md +15 -9

README.md CHANGED Viewed

@@ -37,20 +37,26 @@ korean multi-task instruction dataset
 - CUDA Version: 12.2
 ## Training
-Refer github
 ## Evaluation
 For objective model evaluation, we initially used EleutherAI's lm-evaluation-harness but obtained unsatisfactory results. Consequently, we conducted evaluations using ChatGPT, a widely used model, as described in [Self-Alignment with Instruction Backtranslation](https://arxiv.org/pdf/2308.06502.pdf) and [Three Ways of Using Large Language Models to Evaluate Chat](https://arxiv.org/pdf/2308.06259.pdf) .
-| model                          | score   | average score | %          |
-| ------------------------------ | ------- |---------------|------------|
-| gpt-3.5-turbo                  | 147     | 3.97          | 79.45%     |
-| WizardLM-13B-V1.2              | 96      | 2.59          | 51.89%     |
-| Llama-2-7b-chat-hf             | 67      | 1.81          | 36.21%     |
-| Llama-2-13b-chat-hf            | 73      | 1.91          | 38.37%     |
-| **komt-llama2-7b-v1 (ours)**   | **117** | **3.16**      | **63.24%** |
-| **komt-llama2-13b-v1  (ours)** | **129** | **3.48**      | **69.72%** |
 ------------------------------------------------
 # Original model card: Meta's Llama 2 7B-chat

 - CUDA Version: 12.2
 ## Training
+Refer https://github.com/davidkim205/komt
 ## Evaluation
 For objective model evaluation, we initially used EleutherAI's lm-evaluation-harness but obtained unsatisfactory results. Consequently, we conducted evaluations using ChatGPT, a widely used model, as described in [Self-Alignment with Instruction Backtranslation](https://arxiv.org/pdf/2308.06502.pdf) and [Three Ways of Using Large Language Models to Evaluate Chat](https://arxiv.org/pdf/2308.06259.pdf) .
+| model                                   | score   | average(0~5) | percentage |
+| --------------------------------------- | ------- | ------------ | ---------- |
+| gpt-3.5-turbo(close)                    | 147     | 3.97         | 79.45%     |
+| naver Cue(close)                        | 140     | 3.78         | 75.67%     |
+| clova X(close)                          | 136     | 3.67         | 73.51%     |
+| WizardLM-13B-V1.2(open)                 | 96      | 2.59         | 51.89%     |
+| Llama-2-7b-chat-hf(open)                | 67      | 1.81         | 36.21%     |
+| Llama-2-13b-chat-hf(open)               | 73      | 1.91         | 38.37%     |
+| nlpai-lab/kullm-polyglot-12.8b-v2(open) | 70      | 1.89         | 37.83%     |
+| kfkas/Llama-2-ko-7b-Chat(open)          | 96      | 2.59         | 51.89%     |
+| beomi/KoAlpaca-Polyglot-12.8B(open)     | 100     | 2.70         | 54.05%     |
+| **komt-llama2-7b-v1 (open)(ours)**      | **117** | **3.16**     | **63.24%** |
+| **komt-llama2-13b-v1  (open)(ours)**    | **129** | **3.48**     | **69.72%** |
 ------------------------------------------------
 # Original model card: Meta's Llama 2 7B-chat