multilingual
sea
nxphi47 commited on
Commit
0e1abcd
·
1 Parent(s): 33bee21

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -6
README.md CHANGED
@@ -96,12 +96,15 @@ We conduct SFT with a relatively balanced mix of SFT data from different categor
96
  One of the most reliable ways to compare chatbot models is peer comparison. With the help of native speakers, we built an instruction test set that focus on various aspects expected in a user-facing chatbot, namely (1) NLP tasks (e.g. translation & comprehension), (2) Reasoning, (3) Instruction-following and (4) Natural and Informal questions. The test set also covers all languages that we are concerned with.
97
 
98
  **Pending peer comparison**
99
- <!-- ! Add the stack chart better -->
100
- | vs ChatGPT | win | lose | tie
101
- | --- | --- | --- | --- |
102
- | Polylm-13b-chat | 204 | 1517 | 122
103
- | Qwen-14b-chat | 433 | 1128 | 306
104
- | SeaLLM-13bChat/SFT/v1 | 454 | 1185 | 209
 
 
 
105
 
106
  ### M3Exam - World Knowledge in Regional Languages
107
 
 
96
  One of the most reliable ways to compare chatbot models is peer comparison. With the help of native speakers, we built an instruction test set that focus on various aspects expected in a user-facing chatbot, namely (1) NLP tasks (e.g. translation & comprehension), (2) Reasoning, (3) Instruction-following and (4) Natural and Informal questions. The test set also covers all languages that we are concerned with.
97
 
98
  **Pending peer comparison**
99
+
100
+
101
+
102
+ <img src="seallm_vs_chatgpt_by_lang.png" width="800" />
103
+
104
+ <img src="seallm_vs_chatgpt_by_cat_sea.png" width="800" />
105
+
106
+
107
+
108
 
109
  ### M3Exam - World Knowledge in Regional Languages
110