File size: 7,566 Bytes
4fbf0c2
 
075e5ca
 
 
 
 
 
 
4fbf0c2
 
075e5ca
4fbf0c2
075e5ca
4fbf0c2
075e5ca
 
 
4fbf0c2
075e5ca
 
 
4fbf0c2
075e5ca
4fbf0c2
075e5ca
4fbf0c2
075e5ca
4fbf0c2
075e5ca
4fbf0c2
075e5ca
 
6f440ab
 
 
 
 
075e5ca
 
 
 
 
 
d98f1a3
0ec4a9a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
library_name: transformers
license: gemma
language:
- en
- zh
base_model:
- google/gemma-2-9b-it
pipeline_tag: text-generation
---

# Kyara: Knowledge Yielding Adaptive Retrieval Augmentation for LLM Fine-tuning

[![DOI](https://zenodo.org/badge/844304447.svg)](https://zenodo.org/badge/latestdoi/844304447)

<p align="left">
    🤗 <a href="https://huggingface.co/zake7749/Llama-3.2-1B-it-chinese-kyara/">Hugging Face</a>&nbsp; | 🚀<a href="https://github.com/zake7749/kyara">Github</a>&nbsp; | &nbsp;📑 <a href="#">Paper</a>&nbsp; | &nbsp;📖 <a href="https://github.com/zake7749/kyara/blob/main/document/README_EN.md">English</a>&nbsp; | &nbsp;📖 <a href="https://github.com/zake7749/kyara">Chinese</a>&nbsp; | &nbsp;💻 <a href="https://www.kaggle.com/code/zake7749/kyara-a-compact-yet-powerful-chinese-llm">Kaggle Notebook</a>
</p>

<div style="text-align: center;">
  <img src="https://i.imgur.com/QiWlcYJ.jpeg" alt="kyara"/>
</div>

Kyara (Knowledge Yielding Adaptive Retrieval Augmentation) is an experimental project aimed at improving language models through knowledge retrieval processes. The project seeks to enhance the model’s ability to adapt knowledge and improve language comprehension, particularly in underrepresented languages like Traditional Chinese. Given the relatively scarce availability of Traditional Chinese data compared to the vast corpus of English data used for model training, Kyara addresses this gap by expanding the limited corpus for this language.

This release is a preview version of the Kyara-2.5 series. Compared to [Kyara-1.5](https://huggingface.co/zake7749/gemma-2-2b-it-chinese-kyara-dpo), this iteration incorporates a significantly larger volume of high-quality STEM content and challenging reasoning datasets. Additionally, it employs online reinforcement techniques for preference optimization to refining the model’s performance.

## Benchmark

All evaluations are conducted in a zero-shot setting.

| Metric                   | Kyara-9b-it    | Gemma-2-9b-it |
|--------------------------|----------|-------------|
| **[TMMLUPlus](https://huggingface.co/datasets/ikala/tmmluplus)**            | **60.74** | 54.77    |
| &emsp;- STEM               | **69.54**   | 58.11      |
| &emsp;- Humanities         | **52.64**   | 48.71      |
| &emsp;- Other              | **57.10**   | 51.43      |
| &emsp;- Social-Science     | **63.69**   | 60.84      |
| **[MMLU-Redux](https://github.com/yuchenlin/ZeroEval)**       | **73.04** | 72.82     |
| **[GSM8K](https://github.com/yuchenlin/ZeroEval)**            | **90.37**     | 87.41 |
| **[MATH-L5](https://github.com/yuchenlin/ZeroEval)**          | **31.35**  | 19.42      |
| **[CRUX](https://github.com/yuchenlin/ZeroEval)**             | **49.25**    | 46.00        |
| **[MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench)**    | **8.81** | 8.53      |
| **[MT-Bench-TW](https://huggingface.co/datasets/MediaTek-Research/TCEval-v2)**    | **8.36** | 7.80   |
| **[Chatbot-Arena-Hard](https://github.com/lmarena/arena-hard-auto)**    | **43.90** | 33.60      |
| **[AlignBench](https://github.com/THUDM/AlignBench)**  | **7.25** | 6.88 |

### Details of TMMLU+

#### STEM

| sub_category                    | score   |
|----------------------------------|---------|
| advance_chemistry               | 0.650407 |
| basic_medical_science           | 0.681342 |
| computer_science                | 0.839080 |
| engineering_math                | 0.611650 |
| junior_chemistry                | 0.708134 |
| junior_math_exam                | 0.720000 |
| junior_science_exam             | 0.755869 |
| organic_chemistry               | 0.678899 |
| pharmacy                        | 0.452685 |
| physics                         | 0.742268 |
| secondary_physics               | 0.660714 |
| statistics_and_machine_learning | 0.794643 |
| tve_mathematics                 | 0.766667 |
| tve_natural_sciences            | 0.674528 |

### Humanities

| sub_category                 | score   |
|------------------------------|--------|
| administrative_law          | 0.454762 |
| anti_money_laundering       | 0.738806 |
| general_principles_of_law   | 0.509434 |
| introduction_to_law         | 0.523207 |
| jce_humanities              | 0.577778 |
| taxation                    | 0.322667 |
| trust_practice              | 0.558603 |

### Social Science

| sub_category                                      | score   |
|--------------------------------------------------|---------|
| chinese_language_and_literature                 | 0.457286 |
| clinical_psychology                             | 0.664000 |
| economics                                       | 0.702290 |
| education                                       | 0.653226 |
| education_(profession_level)                    | 0.458848 |
| educational_psychology                          | 0.670455 |
| geography_of_taiwan                             | 0.618490 |
| human_behavior                                  | 0.711974 |
| junior_chinese_exam                             | 0.765714 |
| macroeconomics                                  | 0.649635 |
| national_protection                             | 0.687204 |
| occupational_therapy_for_psychological_disorders | 0.699816 |
| physical_education                              | 0.569832 |
| politic_science                                 | 0.658291 |
| taiwanese_hokkien                               | 0.294574 |
| three_principles_of_people                      | 0.697842 |
| ttqav2                                          | 0.761062 |
| tve_chinese_language                            | 0.745342 |


### Others

| sub_category                                        | score   |
|-----------------------------------------------------|--------:|
| accounting                                         | 0.350785 |
| agriculture                                        | 0.476821 |
| auditing                                           | 0.516364 |
| business_management                                | 0.661871 |
| culinary_skills                                    | 0.636986 |
| dentistry                                          | 0.581454 |
| finance_banking                                    | 0.592593 |
| financial_analysis                                 | 0.722513 |
| fire_science                                       | 0.483871 |
| insurance_studies                                  | 0.497368 |
| junior_social_studies                              | 0.785714 |
| logic_reasoning                                    | 0.589928 |
| management_accounting                              | 0.530233 |
| marketing_management                               | 0.784946 |
| mechanical                                         | 0.711864 |
| music                                             | 0.521583 |
| nautical_science                                   | 0.441016 |
| official_document_management                       | 0.513514 |
| optometry                                         | 0.441304 |
| pharmacology                                      | 0.639515 |
| real_estate                                       | 0.500000 |
| technical                                         | 0.604478 |
| trade                                             | 0.410359 |
| traditional_chinese_medicine_clinical_medicine    | 0.456835 |
| tve_design                                        | 0.735417 |
| veterinary_pathology                              | 0.519435 |
| veterinary_pharmacology                           | 0.711111 |