indiejoseph jed351 commited on
Commit
3595165
·
verified ·
1 Parent(s): 6abfbf6

Update README.md (#1)

Browse files

- Update README.md (776fb41c1a6da3344edfc852baea3b3614bd404c)


Co-authored-by: Jed Cheng <[email protected]>

Files changed (1) hide show
  1. README.md +26 -1
README.md CHANGED
@@ -48,4 +48,29 @@ messages = [
48
  {"role": "user", "content": prompt}
49
  ]
50
  print(chat(messages)) # 香港特別行政區行政長官係李家超。<|im_end|>
51
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
  {"role": "user", "content": prompt}
49
  ]
50
  print(chat(messages)) # 香港特別行政區行政長官係李家超。<|im_end|>
51
+ ```
52
+
53
+
54
+ ## Performance
55
+ Best in class open source LLM in understanding Cantonese and Hong Kong culture in the [HK-Eval Benchmark](https://arxiv.org/pdf/2503.12440).
56
+ However, as one could observe, reasoning models have performed dramatically better than their counterparts. We are currently working on reasoning models for v2.
57
+
58
+ | Model | HK Culture (zero-shot) | Cantonese Linguistics |
59
+ |---------------------------|:----------------------:|:---------------------:|
60
+ | CantonesellmChat v0.5 6B | 52.0% | 12.8% |
61
+ | CantonesellmChat v0.5 34B | 72.5% | 54.5% |
62
+ | CantonesellmChat v1.0 3B | 56.0% | 45.7% |
63
+ | CantonesellmChat v1.0 7B | 60.3% | 46.5% |
64
+ | CantonesellmChat v1.0 32B | 69.8% | 52.7% |
65
+ | CantonesellmChat v1.0 72B | 75.4% | 59.6% |
66
+ | Llama 3.1 8B Instruct | 45.6% | 35.1% |
67
+ | Llama 3.1 70B Instruct | 63.0% | 50.3% |
68
+ | Qwen2.5 7B Instruct | 51.2% | 30.3% |
69
+ | Qwen2.5 32B Instruct | 59.9% | 45.1% |
70
+ | Qwen2.5 72B Instruct | 65.9% | 45.9% |
71
+ | Claude 3.5 Sonnet | 71.7% | 63.2% |
72
+ | DeepSeek R1 | 88.8% | 77.5% |
73
+ | Gemini 2.0 Flash | 80.2% | 75.3% |
74
+ | Gemini 2.5 Pro | 92.1% | 87.3% |
75
+ | GPT4o | 77.5% | 63.8% |
76
+ | GPT4o-mini | 55.6% | 57.3% |