Update README.md (#1)
Browse files- Update README.md (776fb41c1a6da3344edfc852baea3b3614bd404c)
Co-authored-by: Jed Cheng <[email protected]>
README.md
CHANGED
@@ -48,4 +48,29 @@ messages = [
|
|
48 |
{"role": "user", "content": prompt}
|
49 |
]
|
50 |
print(chat(messages)) # 香港特別行政區行政長官係李家超。<|im_end|>
|
51 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
{"role": "user", "content": prompt}
|
49 |
]
|
50 |
print(chat(messages)) # 香港特別行政區行政長官係李家超。<|im_end|>
|
51 |
+
```
|
52 |
+
|
53 |
+
|
54 |
+
## Performance
|
55 |
+
Best in class open source LLM in understanding Cantonese and Hong Kong culture in the [HK-Eval Benchmark](https://arxiv.org/pdf/2503.12440).
|
56 |
+
However, as one could observe, reasoning models have performed dramatically better than their counterparts. We are currently working on reasoning models for v2.
|
57 |
+
|
58 |
+
| Model | HK Culture (zero-shot) | Cantonese Linguistics |
|
59 |
+
|---------------------------|:----------------------:|:---------------------:|
|
60 |
+
| CantonesellmChat v0.5 6B | 52.0% | 12.8% |
|
61 |
+
| CantonesellmChat v0.5 34B | 72.5% | 54.5% |
|
62 |
+
| CantonesellmChat v1.0 3B | 56.0% | 45.7% |
|
63 |
+
| CantonesellmChat v1.0 7B | 60.3% | 46.5% |
|
64 |
+
| CantonesellmChat v1.0 32B | 69.8% | 52.7% |
|
65 |
+
| CantonesellmChat v1.0 72B | 75.4% | 59.6% |
|
66 |
+
| Llama 3.1 8B Instruct | 45.6% | 35.1% |
|
67 |
+
| Llama 3.1 70B Instruct | 63.0% | 50.3% |
|
68 |
+
| Qwen2.5 7B Instruct | 51.2% | 30.3% |
|
69 |
+
| Qwen2.5 32B Instruct | 59.9% | 45.1% |
|
70 |
+
| Qwen2.5 72B Instruct | 65.9% | 45.9% |
|
71 |
+
| Claude 3.5 Sonnet | 71.7% | 63.2% |
|
72 |
+
| DeepSeek R1 | 88.8% | 77.5% |
|
73 |
+
| Gemini 2.0 Flash | 80.2% | 75.3% |
|
74 |
+
| Gemini 2.5 Pro | 92.1% | 87.3% |
|
75 |
+
| GPT4o | 77.5% | 63.8% |
|
76 |
+
| GPT4o-mini | 55.6% | 57.3% |
|