SeaLLMs
/

SeaLLM-13B-Chat

Model card Files Files and versions Community

nxphi47 commited on Oct 26, 2023

Commit

a5246b0

·

1 Parent(s): 0e1abcd

Update README.md

Files changed (1) hide show

README.md +5 -17

README.md CHANGED Viewed

@@ -93,11 +93,12 @@ We conduct SFT with a relatively balanced mix of SFT data from different categor
 ### Peer Comparison
-One of the most reliable ways to compare chatbot models is peer comparison. With the help of native speakers, we built an instruction test set that focus on various aspects expected in a user-facing chatbot, namely (1) NLP tasks (e.g. translation & comprehension), (2) Reasoning, (3) Instruction-following and (4) Natural and Informal questions. The test set also covers all languages that we are concerned with.
-**Pending peer comparison**
 <img src="seallm_vs_chatgpt_by_lang.png" width="800" />
@@ -127,19 +128,6 @@ As shown in the table, our SeaLLM model outperforms most 13B baselines and reach
 | SeaLLM-13bChat/SFT/v2 | 62.35 | 45.81 | 49.92 | 40.04 | 36.49
-<!-- ! Considering removing zero-shot from the main article -->
-<!-- | Random                | 25.00 | 25.00 | 25.00 | 23.00 | 23.00 -->
-<!-- | M3-exam / 0-shot | En | Zh | Vi | Id | Th
-|-----------| ------- | ------- |  ------- | ------- | ------- |
-| ChatGPT               | 75.98 | 61.00 | 57.18 | 48.58 | 34.09
-| Llama-2-13b           | 19.49 | 39.07 | 35.38 | 23.66 | 12.44
-| Llama-2-13b-chat      | 52.57 | 39.52 | 36.56 | 27.39 | 10.40
-| Polylm-13b-chat       | 28.74 | 27.71 | 25.77 | 22.01 | 13.65
-| Qwen-PolyLM-7b-chat   | 52.51 | 56.14 | 32.34 | 25.49 | 24.64
-| SeaLLM-13b/78k-step   | 36.68 | 36.58 | 41.98 | 25.87 | 20.11
-| SeaLLM-13bChat/SFT/v1 | 64.30 | 45.58 | 48.13 | 37.76 | 30.77
-| SeaLLM-13bChat/SFT/v2 | 62.23 | 41.00 | 47.23 | 35.10 | 30.77 -->
 ### MMLU - Preserving English-based knowledge

 ### Peer Comparison
+One of the most reliable ways to compare chatbot models is peer comparison.
+With the help of native speakers, we built an instruction test set that focus on various aspects expected in a user-facing chatbot, namely"
+(1) NLP tasks (e.g. translation & comprehension), (2) Reasoning, (3) Instruction-following and
+(4) Natural and Informal questions. The test set also covers all languages that we are concerned with.
+We use GPT-4 as an evaluator to rate the comparison between our models versus ChatGPT-3.5 and other baselines.
 <img src="seallm_vs_chatgpt_by_lang.png" width="800" />
 | SeaLLM-13bChat/SFT/v2 | 62.35 | 45.81 | 49.92 | 40.04 | 36.49
 ### MMLU - Preserving English-based knowledge