Update README.md
Browse files
README.md
CHANGED
@@ -115,7 +115,6 @@ We use GPT-4 as an evaluator to rate the comparison between our models versus Ch
|
|
115 |
<img src="seallm_vs_chatgpt_by_cat_sea.png" width="800" />
|
116 |
|
117 |
|
118 |
-
|
119 |
### M3Exam - World Knowledge in Regional Languages
|
120 |
|
121 |
|
@@ -141,10 +140,9 @@ On the 5-shot [MMLU](https://arxiv.org/abs/2009.03300), our SeaLLM models not on
|
|
141 |
|
142 |
| MMLU (Acc) | STEM | Humanities | Social | Others | Average
|
143 |
|-----------| ------- | ------- | ------- | ------- | ------- |
|
144 |
-
| Llama-2-13b
|
145 |
-
| Llama-2-13b-chat
|
146 |
-
| SeaLLM-
|
147 |
-
| SeaLLM-13bChat/SFT/v3 | 43.30 | 52.80 | 63.10 | 61.20 | 55.00
|
148 |
|
149 |
|
150 |
### NLP tasks
|
|
|
115 |
<img src="seallm_vs_chatgpt_by_cat_sea.png" width="800" />
|
116 |
|
117 |
|
|
|
118 |
### M3Exam - World Knowledge in Regional Languages
|
119 |
|
120 |
|
|
|
140 |
|
141 |
| MMLU (Acc) | STEM | Humanities | Social | Others | Average
|
142 |
|-----------| ------- | ------- | ------- | ------- | ------- |
|
143 |
+
| Llama-2-13b | 44.1 | 52.8 | 62.6 | 61.1 | 54.8
|
144 |
+
| Llama-2-13b-chat | 43.7 | 49.3 | 62.6 | 60.1 | 53.5
|
145 |
+
| SeaLLM-13b-chat | 43.4 | 53.0 | 63.3 | 61.4 | 55.1
|
|
|
146 |
|
147 |
|
148 |
### NLP tasks
|