File size: 7,566 Bytes
4fbf0c2 075e5ca 4fbf0c2 075e5ca 4fbf0c2 075e5ca 4fbf0c2 075e5ca 4fbf0c2 075e5ca 4fbf0c2 075e5ca 4fbf0c2 075e5ca 4fbf0c2 075e5ca 4fbf0c2 075e5ca 4fbf0c2 075e5ca 6f440ab 075e5ca d98f1a3 0ec4a9a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
---
library_name: transformers
license: gemma
language:
- en
- zh
base_model:
- google/gemma-2-9b-it
pipeline_tag: text-generation
---
# Kyara: Knowledge Yielding Adaptive Retrieval Augmentation for LLM Fine-tuning
[](https://zenodo.org/badge/latestdoi/844304447)
<p align="left">
🤗 <a href="https://huggingface.co/zake7749/Llama-3.2-1B-it-chinese-kyara/">Hugging Face</a> | 🚀<a href="https://github.com/zake7749/kyara">Github</a> | 📑 <a href="#">Paper</a> | 📖 <a href="https://github.com/zake7749/kyara/blob/main/document/README_EN.md">English</a> | 📖 <a href="https://github.com/zake7749/kyara">Chinese</a> | 💻 <a href="https://www.kaggle.com/code/zake7749/kyara-a-compact-yet-powerful-chinese-llm">Kaggle Notebook</a>
</p>
<div style="text-align: center;">
<img src="https://i.imgur.com/QiWlcYJ.jpeg" alt="kyara"/>
</div>
Kyara (Knowledge Yielding Adaptive Retrieval Augmentation) is an experimental project aimed at improving language models through knowledge retrieval processes. The project seeks to enhance the model’s ability to adapt knowledge and improve language comprehension, particularly in underrepresented languages like Traditional Chinese. Given the relatively scarce availability of Traditional Chinese data compared to the vast corpus of English data used for model training, Kyara addresses this gap by expanding the limited corpus for this language.
This release is a preview version of the Kyara-2.5 series. Compared to [Kyara-1.5](https://huggingface.co/zake7749/gemma-2-2b-it-chinese-kyara-dpo), this iteration incorporates a significantly larger volume of high-quality STEM content and challenging reasoning datasets. Additionally, it employs online reinforcement techniques for preference optimization to refining the model’s performance.
## Benchmark
All evaluations are conducted in a zero-shot setting.
| Metric | Kyara-9b-it | Gemma-2-9b-it |
|--------------------------|----------|-------------|
| **[TMMLUPlus](https://huggingface.co/datasets/ikala/tmmluplus)** | **60.74** | 54.77 |
|  - STEM | **69.54** | 58.11 |
|  - Humanities | **52.64** | 48.71 |
|  - Other | **57.10** | 51.43 |
|  - Social-Science | **63.69** | 60.84 |
| **[MMLU-Redux](https://github.com/yuchenlin/ZeroEval)** | **73.04** | 72.82 |
| **[GSM8K](https://github.com/yuchenlin/ZeroEval)** | **90.37** | 87.41 |
| **[MATH-L5](https://github.com/yuchenlin/ZeroEval)** | **31.35** | 19.42 |
| **[CRUX](https://github.com/yuchenlin/ZeroEval)** | **49.25** | 46.00 |
| **[MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench)** | **8.81** | 8.53 |
| **[MT-Bench-TW](https://huggingface.co/datasets/MediaTek-Research/TCEval-v2)** | **8.36** | 7.80 |
| **[Chatbot-Arena-Hard](https://github.com/lmarena/arena-hard-auto)** | **43.90** | 33.60 |
| **[AlignBench](https://github.com/THUDM/AlignBench)** | **7.25** | 6.88 |
### Details of TMMLU+
#### STEM
| sub_category | score |
|----------------------------------|---------|
| advance_chemistry | 0.650407 |
| basic_medical_science | 0.681342 |
| computer_science | 0.839080 |
| engineering_math | 0.611650 |
| junior_chemistry | 0.708134 |
| junior_math_exam | 0.720000 |
| junior_science_exam | 0.755869 |
| organic_chemistry | 0.678899 |
| pharmacy | 0.452685 |
| physics | 0.742268 |
| secondary_physics | 0.660714 |
| statistics_and_machine_learning | 0.794643 |
| tve_mathematics | 0.766667 |
| tve_natural_sciences | 0.674528 |
### Humanities
| sub_category | score |
|------------------------------|--------|
| administrative_law | 0.454762 |
| anti_money_laundering | 0.738806 |
| general_principles_of_law | 0.509434 |
| introduction_to_law | 0.523207 |
| jce_humanities | 0.577778 |
| taxation | 0.322667 |
| trust_practice | 0.558603 |
### Social Science
| sub_category | score |
|--------------------------------------------------|---------|
| chinese_language_and_literature | 0.457286 |
| clinical_psychology | 0.664000 |
| economics | 0.702290 |
| education | 0.653226 |
| education_(profession_level) | 0.458848 |
| educational_psychology | 0.670455 |
| geography_of_taiwan | 0.618490 |
| human_behavior | 0.711974 |
| junior_chinese_exam | 0.765714 |
| macroeconomics | 0.649635 |
| national_protection | 0.687204 |
| occupational_therapy_for_psychological_disorders | 0.699816 |
| physical_education | 0.569832 |
| politic_science | 0.658291 |
| taiwanese_hokkien | 0.294574 |
| three_principles_of_people | 0.697842 |
| ttqav2 | 0.761062 |
| tve_chinese_language | 0.745342 |
### Others
| sub_category | score |
|-----------------------------------------------------|--------:|
| accounting | 0.350785 |
| agriculture | 0.476821 |
| auditing | 0.516364 |
| business_management | 0.661871 |
| culinary_skills | 0.636986 |
| dentistry | 0.581454 |
| finance_banking | 0.592593 |
| financial_analysis | 0.722513 |
| fire_science | 0.483871 |
| insurance_studies | 0.497368 |
| junior_social_studies | 0.785714 |
| logic_reasoning | 0.589928 |
| management_accounting | 0.530233 |
| marketing_management | 0.784946 |
| mechanical | 0.711864 |
| music | 0.521583 |
| nautical_science | 0.441016 |
| official_document_management | 0.513514 |
| optometry | 0.441304 |
| pharmacology | 0.639515 |
| real_estate | 0.500000 |
| technical | 0.604478 |
| trade | 0.410359 |
| traditional_chinese_medicine_clinical_medicine | 0.456835 |
| tve_design | 0.735417 |
| veterinary_pathology | 0.519435 |
| veterinary_pharmacology | 0.711111 |
|