hon9kon9ize
/

CantoneseLLMChat-v1.0-72B

Text Generation

Transformers

Safetensors

qwen2

llama-factory

full

Generated from Trainer

conversational

text-generation-inference

Model card Files Files and versions Community

indiejoseph

jed351 commited on 14 days ago

Commit

3595165

verified ·

1 Parent(s): 6abfbf6

Update README.md (#1)

Browse files

- Update README.md (776fb41c1a6da3344edfc852baea3b3614bd404c)

Co-authored-by: Jed Cheng <[email protected]>

Files changed (1) hide show

README.md +26 -1

README.md CHANGED Viewed

@@ -48,4 +48,29 @@ messages = [
     {"role": "user", "content": prompt}
 ]
 print(chat(messages)) # 香港特別行政區行政長官係李家超。<|im_end|>
-```

     {"role": "user", "content": prompt}
 ]
 print(chat(messages)) # 香港特別行政區行政長官係李家超。<|im_end|>
+```
+## Performance
+Best in class open source LLM in understanding Cantonese and Hong Kong culture in the [HK-Eval Benchmark](https://arxiv.org/pdf/2503.12440).
+However, as one could observe, reasoning models have performed dramatically better than their counterparts. We are currently working on reasoning models for v2.
+| Model                     | HK Culture (zero-shot) | Cantonese Linguistics |
+|---------------------------|:----------------------:|:---------------------:|
+| CantonesellmChat v0.5 6B  |          52.0%         |         12.8%         |
+| CantonesellmChat v0.5 34B |          72.5%         |         54.5%         |
+| CantonesellmChat v1.0 3B  |          56.0%         |         45.7%         |
+| CantonesellmChat v1.0 7B  |          60.3%         |         46.5%         |
+| CantonesellmChat v1.0 32B |          69.8%         |         52.7%         |
+| CantonesellmChat v1.0 72B |          75.4%         |         59.6%         |
+| Llama 3.1 8B Instruct     |          45.6%         |         35.1%         |
+| Llama 3.1 70B Instruct    |          63.0%         |         50.3%         |
+| Qwen2.5 7B Instruct       |          51.2%         |         30.3%         |
+| Qwen2.5 32B Instruct      |          59.9%         |         45.1%         |
+| Qwen2.5 72B Instruct      |          65.9%         |         45.9%         |
+| Claude 3.5 Sonnet         |          71.7%         |         63.2%         |
+| DeepSeek R1               |          88.8%         |         77.5%         |
+| Gemini 2.0 Flash          |          80.2%         |         75.3%         |
+| Gemini 2.5 Pro            |          92.1%         |         87.3%         |
+| GPT4o                     |          77.5%         |         63.8%         |
+| GPT4o-mini                |          55.6%         |         57.3%         |