Update README.md
Browse files
README.md
CHANGED
@@ -96,12 +96,15 @@ We conduct SFT with a relatively balanced mix of SFT data from different categor
|
|
96 |
One of the most reliable ways to compare chatbot models is peer comparison. With the help of native speakers, we built an instruction test set that focus on various aspects expected in a user-facing chatbot, namely (1) NLP tasks (e.g. translation & comprehension), (2) Reasoning, (3) Instruction-following and (4) Natural and Informal questions. The test set also covers all languages that we are concerned with.
|
97 |
|
98 |
**Pending peer comparison**
|
99 |
-
|
100 |
-
|
101 |
-
|
102 |
-
|
103 |
-
|
104 |
-
|
|
|
|
|
|
|
105 |
|
106 |
### M3Exam - World Knowledge in Regional Languages
|
107 |
|
|
|
96 |
One of the most reliable ways to compare chatbot models is peer comparison. With the help of native speakers, we built an instruction test set that focus on various aspects expected in a user-facing chatbot, namely (1) NLP tasks (e.g. translation & comprehension), (2) Reasoning, (3) Instruction-following and (4) Natural and Informal questions. The test set also covers all languages that we are concerned with.
|
97 |
|
98 |
**Pending peer comparison**
|
99 |
+
|
100 |
+
|
101 |
+
|
102 |
+
<img src="seallm_vs_chatgpt_by_lang.png" width="800" />
|
103 |
+
|
104 |
+
<img src="seallm_vs_chatgpt_by_cat_sea.png" width="800" />
|
105 |
+
|
106 |
+
|
107 |
+
|
108 |
|
109 |
### M3Exam - World Knowledge in Regional Languages
|
110 |
|