Update README.md
Browse files
README.md
CHANGED
@@ -109,26 +109,41 @@ We use GPT-4 as an evaluator to rate the comparison between our models versus Ch
|
|
109 |
|
110 |
Compared with [PolyLM-13b-chat](https://arxiv.org/pdf/2307.06018.pdf), a recent multilingual model, our model significantly outperforms across all languages and categories.
|
111 |
|
112 |
-
<
|
113 |
-
|
114 |
-
<img src="
|
|
|
|
|
|
|
|
|
|
|
115 |
|
116 |
Compared with Llama-2-13b-chat, our SeaLLM-13b performs significantly better in all SEA languages,
|
117 |
despite the fact that Llama-2 was already trained on a decent data amount of Vi, Id, and Th.
|
118 |
In english, our model is 46% as good as Llama-2-13b-chat, even though it did not undergo complex human-labor intensive RLHF.
|
119 |
|
120 |
-
<img src="seallm_vs_llama2_by_lang.png" width="500" />
|
121 |
|
122 |
-
<
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
123 |
|
124 |
Compared with ChatGPT-3.5, our SeaLLM-13b model is performing 45% as good as ChatGPT for Thai.
|
125 |
For important aspects such as Safety and Task-Solving, our model nearly on par with ChatGPT across the languages.
|
126 |
|
127 |
-
<img src="seallm_vs_chatgpt_by_lang.png" width="500" />
|
128 |
-
|
129 |
-
<img src="seallm_vs_chatgpt_by_cat_sea.png" width="500" />
|
130 |
-
|
131 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
132 |
|
133 |
### M3Exam - World Knowledge in Regional Languages
|
134 |
|
|
|
109 |
|
110 |
Compared with [PolyLM-13b-chat](https://arxiv.org/pdf/2307.06018.pdf), a recent multilingual model, our model significantly outperforms across all languages and categories.
|
111 |
|
112 |
+
<div class="row" style="display: flex; clear: both;">
|
113 |
+
<div class="column" style="float: left; width: 49%">
|
114 |
+
<img src="seallm_vs_polylm_by_lang.png" alt="Snow" style="width:100%">
|
115 |
+
</div>
|
116 |
+
<div class="column" style="float: left; width: 49%">
|
117 |
+
<img src="seallm_vs_polylm_by_cat_sea.png" alt="Forest" style="width:100%">
|
118 |
+
</div>
|
119 |
+
</div>
|
120 |
|
121 |
Compared with Llama-2-13b-chat, our SeaLLM-13b performs significantly better in all SEA languages,
|
122 |
despite the fact that Llama-2 was already trained on a decent data amount of Vi, Id, and Th.
|
123 |
In english, our model is 46% as good as Llama-2-13b-chat, even though it did not undergo complex human-labor intensive RLHF.
|
124 |
|
|
|
125 |
|
126 |
+
<div class="row" style="display: flex; clear: both;">
|
127 |
+
<div class="column" style="float: left; width: 49%">
|
128 |
+
<img src="seallm_vs_llama2_by_lang.png" alt="Snow" style="width:100%">
|
129 |
+
</div>
|
130 |
+
<div class="column" style="float: left; width: 49%">
|
131 |
+
<img src="seallm_vs_llama2_by_cat_sea.png" alt="Forest" style="width:100%">
|
132 |
+
</div>
|
133 |
+
</div>
|
134 |
|
135 |
Compared with ChatGPT-3.5, our SeaLLM-13b model is performing 45% as good as ChatGPT for Thai.
|
136 |
For important aspects such as Safety and Task-Solving, our model nearly on par with ChatGPT across the languages.
|
137 |
|
|
|
|
|
|
|
|
|
138 |
|
139 |
+
<div class="row" style="display: flex; clear: both;">
|
140 |
+
<div class="column" style="float: left; width: 49%">
|
141 |
+
<img src="seallm_vs_chatgpt_by_lang.png" alt="Snow" style="width:100%">
|
142 |
+
</div>
|
143 |
+
<div class="column" style="float: left; width: 49%">
|
144 |
+
<img src="seallm_vs_chatgpt_by_cat_sea.png" alt="Forest" style="width:100%">
|
145 |
+
</div>
|
146 |
+
</div>
|
147 |
|
148 |
### M3Exam - World Knowledge in Regional Languages
|
149 |
|