Update README.md
Browse files
README.md
CHANGED
@@ -132,10 +132,7 @@ As shown in the table, our SeaLLM model outperforms most 13B baselines and reach
|
|
132 |
| Llama-2-13b-chat | 61.17 | 43.29 | 39.97 | 35.50 | 23.74
|
133 |
| Polylm-13b-chat | 32.23 | 29.26 | 29.01 | 25.36 | 18.08
|
134 |
| Qwen-PolyLM-7b-chat | 53.65 | 61.58 | 39.26 | 33.69 | 29.02
|
135 |
-
| SeaLLM-13b
|
136 |
-
| SeaLLM-13bChat/SFT/v1 | 63.53 | 45.47 | 50.25 | 39.85 | 36.07
|
137 |
-
| SeaLLM-13bChat/SFT/v2 | 62.35 | 45.81 | 49.92 | 40.04 | 36.49
|
138 |
-
|
139 |
|
140 |
|
141 |
### MMLU - Preserving English-based knowledge
|
@@ -164,8 +161,7 @@ As shown in the table below, the 1-shot reading comprehension performance is sig
|
|
164 |
|-----------| ------- | ------- | ------- | ------- | ------- | ------- | ------- |
|
165 |
| Llama-2-13b | 83.22 | 78.02 | 71.03 | 59.31 | 30.73 | 64.46 | 59.77
|
166 |
| Llama-2-13b-chat | 80.46 | 70.54 | 62.87 | 63.05 | 25.73 | 60.93 | 51.21
|
167 |
-
| SeaLLM-13b-chat
|
168 |
-
| SeaLLM-13b-chat-v2 | 81.51 | 76.10 | 73.64 | 69.11 | 64.54 | 72.98 | 69.10
|
169 |
|
170 |
|
171 |
#### Translation
|
@@ -174,12 +170,12 @@ For translation tasks, we evaluate our models with the [FloRes-200](https://gith
|
|
174 |
|
175 |
Similarly observed, our SeaLLM models outperform Llama-2 significantly in the new languages.
|
176 |
|
|
|
177 |
| FloRes-200 (chrF++) | En-Zh | En-Vi | En-Id | En-Th | En->X | Zh-En | Vi-En | Id-En | Th-En | X->En
|
178 |
|-------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
|
179 |
-
| Llama-2-13b
|
180 |
-
| Llama-2-13b-chat
|
181 |
-
| SeaLLM-13b-chat
|
182 |
-
| SeaLLM-13b-chat-v2 | 22.75 | 58.78 | 65.90 | 42.60 | 55.76 | 53.34 | 60.80 | 65.44 | 57.05 | 61.10
|
183 |
|
184 |
Our models are also performing competitively with ChatGPT for translation between SEA languages without English pivoting.
|
185 |
|
@@ -197,7 +193,7 @@ Lastly, in 2-shot [XL-sum summarization tasks](https://aclanthology.org/2021.fin
|
|
197 |
|-------- | ---- | ---- | ---- | ---- | ---- |
|
198 |
| Llama-2-13b | 32.57 | 34.37 | 18.61 | 25.14 | 16.91
|
199 |
| Llama-2-13b-chat | 25.11 | 31.13 | 18.29 | 22.45 | 17.51
|
200 |
-
| SeaLLM-13b-chat
|
201 |
|
202 |
## Acknowledge our linguists
|
203 |
|
|
|
132 |
| Llama-2-13b-chat | 61.17 | 43.29 | 39.97 | 35.50 | 23.74
|
133 |
| Polylm-13b-chat | 32.23 | 29.26 | 29.01 | 25.36 | 18.08
|
134 |
| Qwen-PolyLM-7b-chat | 53.65 | 61.58 | 39.26 | 33.69 | 29.02
|
135 |
+
| SeaLLM-13b-chat | 63.53 | 46.31 | 49.25 | 40.61 | 36.30
|
|
|
|
|
|
|
136 |
|
137 |
|
138 |
### MMLU - Preserving English-based knowledge
|
|
|
161 |
|-----------| ------- | ------- | ------- | ------- | ------- | ------- | ------- |
|
162 |
| Llama-2-13b | 83.22 | 78.02 | 71.03 | 59.31 | 30.73 | 64.46 | 59.77
|
163 |
| Llama-2-13b-chat | 80.46 | 70.54 | 62.87 | 63.05 | 25.73 | 60.93 | 51.21
|
164 |
+
| SeaLLM-13b-chat | 75.23 | 75.65 | 72.86 | 64.37 | 61.37 | 69.90 | 66.20
|
|
|
165 |
|
166 |
|
167 |
#### Translation
|
|
|
170 |
|
171 |
Similarly observed, our SeaLLM models outperform Llama-2 significantly in the new languages.
|
172 |
|
173 |
+
|
174 |
| FloRes-200 (chrF++) | En-Zh | En-Vi | En-Id | En-Th | En->X | Zh-En | Vi-En | Id-En | Th-En | X->En
|
175 |
|-------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
|
176 |
+
| Llama-2-13b | 24.36 | 53.20 | 60.41 | 22.16 | 45.26 | 53.20 | 59.10 | 63.42 | 38.48 | 53.55
|
177 |
+
| Llama-2-13b-chat | 19.58 | 51.70 | 57.14 | 21.18 | 37.40 | 52.27 | 54.32 | 60.55 | 30.18 | 49.33
|
178 |
+
| SeaLLM-13b-chat | 23.12 | 53.67 | 59.00 | 60.93 | 66.16 | 65.66 | 43.33 | 57.39
|
|
|
179 |
|
180 |
Our models are also performing competitively with ChatGPT for translation between SEA languages without English pivoting.
|
181 |
|
|
|
193 |
|-------- | ---- | ---- | ---- | ---- | ---- |
|
194 |
| Llama-2-13b | 32.57 | 34.37 | 18.61 | 25.14 | 16.91
|
195 |
| Llama-2-13b-chat | 25.11 | 31.13 | 18.29 | 22.45 | 17.51
|
196 |
+
| SeaLLM-13b-chat | 26.88 | 33.39 | 19.39 | 25.96 | 21.37
|
197 |
|
198 |
## Acknowledge our linguists
|
199 |
|