Commit
·
97ca638
1
Parent(s):
5751faf
Update README.md
Browse files
README.md
CHANGED
@@ -101,6 +101,18 @@ Hard ACC:54.71
|
|
101 |
|
102 |
Win rate **88.26%** on [AlpacaEval Leaderboard](https://tatsu-lab.github.io/alpaca_eval/) [view raw](https://github.com/tatsu-lab/alpaca_eval/blob/3a47dcd81c56f6a8e6a5711f2754013919fbe90a/results/causallm-14b/model_outputs.json)
|
103 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
104 |
## Other languages
|
105 |
We are currently unable to produce accurate benchmark templates for non-QA tasks (languages other than English and Chinese). However, we will be working on other language versions of the QA-Task challenge in the near future.
|
106 |
### Japanese Benchmark
|
|
|
101 |
|
102 |
Win rate **88.26%** on [AlpacaEval Leaderboard](https://tatsu-lab.github.io/alpaca_eval/) [view raw](https://github.com/tatsu-lab/alpaca_eval/blob/3a47dcd81c56f6a8e6a5711f2754013919fbe90a/results/causallm-14b/model_outputs.json)
|
103 |
|
104 |
+
## MT-Behch on DPO Version
|
105 |
+
| Model | MT-Bench |
|
106 |
+
| ------------------------- | ------------ |
|
107 |
+
| GPT-4 | 8.99 |
|
108 |
+
| GPT-3.5-Turbo | 7.94 |
|
109 |
+
| | |
|
110 |
+
| Zephyr-7b-β (Overfitting) | 7.34 |
|
111 |
+
| Zephyr-7b-α | 6.88 |
|
112 |
+
| | |
|
113 |
+
| **[CausalLM/14B-DPO-α](https://huggingface.co/CausalLM/14B-DPO-alpha)** | **7.618868** |
|
114 |
+
| **[CausalLM/7B-DPO-α](https://huggingface.co/CausalLM/7B-DPO-alpha)** | **7.038125** |
|
115 |
+
|
116 |
## Other languages
|
117 |
We are currently unable to produce accurate benchmark templates for non-QA tasks (languages other than English and Chinese). However, we will be working on other language versions of the QA-Task challenge in the near future.
|
118 |
### Japanese Benchmark
|