File size: 7,906 Bytes
6b7d868
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8160fb2
6b7d868
 
 
 
 
8160fb2
6b7d868
 
 
 
 
 
 
 
 
 
1a928dc
6b7d868
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
482d12d
 
 
 
 
 
 
 
 
6b7d868
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4b1b446
 
 
 
 
 
 
 
 
6b7d868
 
 
33c5b66
6b7d868
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
598d448
 
 
 
 
 
 
 
 
 
 
 
 
8160fb2
598d448
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6b7d868
0399ec5
 
6b7d868
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
---
language:
- en
- ko
library_name: transformers
license: other
license_name: "kanana"
license_link: LICENSE
pipeline_tag: text-generation
model_id: kakaocorp/kanana-1.5-15.7b-a3b-base
repo: kakaocorp/kanana-1.5-15.7b-a3b-base
developers: Kanana LLM
training_regime: bf16 mixed precision
---

<p align="center">
<br>
    <picture>
        <img src="./assets/logo/kanana-logo.png" width="60%" style="margin: 40px auto;">
    </picture>
</br>

<p align="center">
πŸ€— <a href="https://kko.kakao.com/kananallm">1.5 HF Models</a> &nbsp |
&nbsp πŸ“• <a href="https://tech.kakao.com/posts/716">Kanana-1.5-15.7B-A3B Blog</a> &nbsp

<br>

## News πŸ”₯

- ✨`2025/07/24`: Published a [blog post](https://tech.kakao.com/posts/716) about `Kanana-1.5-15.7B-A3B` models and released πŸ€—[HF model weights](https://kko.kakao.com/kananallm).
- πŸ“•`2025/05/23`: Published a [blog post](https://tech.kakao.com/posts/707) about `Kanana 1.5` models and released πŸ€—[HF model weights](https://kko.kakao.com/kananallm).
- πŸ“œ`2025/02/27`: Released [Technical Report](https://arxiv.org/abs/2502.18934) and πŸ€—[HF model weights](https://huggingface.co/collections/kakaocorp/kanana-nano-21b-67a326cda1c449c8d4172259).
- πŸ“•`2025/01/10`: Published a [blog post](https://tech.kakao.com/posts/682) about the development of `Kanana Nano` model.
- πŸ“•`2024/11/14`: Published blog posts ([pre-training](https://tech.kakao.com/posts/661), [post-training](https://tech.kakao.com/posts/662)) about the development of `Kanana` models.
- ▢️`2024/11/06`: Published a [presentation video](https://youtu.be/HTBl142x9GI?si=o_we6t9suYK8DfX3) about the development of the `Kanana` models.

<br>

## Table of Contents

- [Kanana-1.5-15.7B-A3B](#kanana-15-157b-a3b)
    - [Performance](#performance)
        - [Base Model Evaluation](#base-model-evaluation)
        - [Instruct Model Evaluation](#instruct-model-evaluation)
- [Contributors](#contributors)
- [Citation](#citation)
- [Contact](#contact)

<br>

# Kanana-1.5-15.7B-A3B

Introducing `Kanana-1.5-15.7B-A3B`, the first Mixture-of-Experts (MoE) model in our Kanana family, engineered for exceptional efficiency and powerful performance. `Kanana-1.5-15.7B-A3B`, which has sparse architecture, delivers capabilities comparable to the `Kanana-1.5-8B` dense model while utilizing only 37% of the FLOPS per token, making it a highly inference-efficient and cost-effective solution for real-world applications. Furthermore, `Kanana-1.5-15.7B-A3B` is powered by our newly enhanced post-training strategy, which includes on-policy distillation followed by reinforcement learning.

> [!Note]
> Neither the pre-training nor the post-training data includes Kakao user data.

## Performance

### Base Model Evaluation
<table>
    <tr>
        <th>Models</th>
        <th>MMLU</th>
        <th>KMMLU</th>
        <th>HAERAE</th>
        <th>HumanEval</th>
        <th>MBPP</th>
        <th>GSM8K</th>
    </tr>
    <tr>
        <td><strong>Kanana-1.5-15.7B-A3B</strong></td>
        <td align="center">64.79</td>
        <td align="center">51.77</td>
        <td align="center">83.23</td>
        <td align="center">59.76</td>
        <td align="center">60.10</td>
        <td align="center">61.18</td>
    </tr>
    <tr>
        <td>Kanana-1.5-8B</td>
        <td align="center">64.24</td>
        <td align="center">48.94</td>
        <td align="center">82.77</td>
        <td align="center">61.59</td>
        <td align="center">57.80</td>
        <td align="center">63.53</td>
    </tr>
    <tr>
        <td>Kanana-1.5-3B*</td>
        <td align="center">59.23</td>
        <td align="center">47.30</td>
        <td align="center">78.00</td>
        <td align="center">46.34</td>
        <td align="center">46.80</td>
        <td align="center">61.79</td>
    </tr>
</table>

<br>

### Instruct Model Evaluation
<table>
    <tr>
        <th>Models</th>
        <th>MT-Bench</th>
        <th>KoMT-Bench</th>
        <th>IFEval</th>
        <th>HumanEval+</th>
        <th>MBPP+</th>
        <th>GSM8K (0-shot)</th>
        <th>MATH</th>
        <th>MMLU (0-shot, CoT)</th>
        <th>KMMLU (0-shot, CoT)</th>
    </tr>
    <tr>
        <td><strong>Kanana-1.5-15.7B-A3B</strong></td>
        <td align="center">7.67</td>
        <td align="center">7.24</td>
        <td align="center">73.35</td>
        <td align="center">79.27</td>
        <td align="center">70.37</td>
        <td align="center">83.02</td>
        <td align="center">66.42</td>
        <td align="center">68.55</td>
        <td align="center">48.92</td>
    </tr>
    <tr>
        <td>Kanana-1.5-8B</td>
        <td align="center">7.76</td>
        <td align="center">7.63</td>
        <td align="center">80.11</td>
        <td align="center">76.83</td>
        <td align="center">67.99</td>
        <td align="center">87.64</td>
        <td align="center">67.54</td>
        <td align="center">68.82</td>
        <td align="center">48.28</td>
    </tr>
    <tr>
        <td>Kanana-1.5-3B*</td>
        <td align="center">7.01</td>
        <td align="center">6.52</td>
        <td align="center">70.08</td>
        <td align="center">70.73</td>
        <td align="center">64.29</td>
        <td align="center">80.36</td>
        <td align="center">56.70</td>
        <td align="center">59.69</td>
        <td align="center">37.60</td>
    </tr>
</table>

> [!Note]
> \* This model is not an open-sourced, just for comparison with Kanana-1.5-15.7B-A3B

<br>

### Evaluation Protocol
- Base Model Benchmarks
    - MMLU, KMMLU, HAE-RAE: 5-shot, log-likelihood
    - HumanEval: 0-shot, pass@1
    - MBPP: 3-shot, pass@1
    - GSM8K: 5-shot, exact-match (strict-match)

- Instruct Model Benchmarks
    - MT-Bench, KoMT-Bench: 0-shot, gpt-4o-2024-08-06 as judge model 
    - IFEval: 0-shot, mean of strict-prompt-level and strict-instruction-level
    - HumanEval+, MBPP+: 0-shot, pass@1
    - GSM8K, MATH: 0-shot, rule-based verification

<br>

## Quickstart

### vLLM
- `vllm>=0.8.5` or the latest version is required to run `Kanana` model.

#### Example Usage for `Kanana-1.5-15.7B-A3B-Base`
```bash
vllm serve $path_to_model \
        --served_model_name kanana-1.5-15.7b-a3b-base \
        --max-model-len 32768 \
        --gpu-memory-utilization 0.9 \
        --port 8000 \
        --dtype auto \
        --disable_cascade_attn

curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{
    "model": "kanana-1.5-15.7b-a3b-base",
    "prompt": "Kakao is a leading company in South Korea, and it is known for ",
    "max_tokens": 32,
    "top_k": 1
}'

# Output:
'''
...
"choices":[{"index":0,"text":"1) its innovative technology, 2) its high-quality products, and 3) its strong brand image. The company has a long history of success,"...
...
'''
```

<br>

## Contributors
- Language Model Training
  - Yunju Bak, Doohae Jung, Boseop Kim, Nayeon Kim, Hojin Lee, Jaesun Park, Minho Ryu, Jiyeon Ham, Seungjae Jung, Hyunho Kim, Hyunwoong Ko, Changmin Lee, Taegyeong Eo

<br>
 
## Citation
 
```
@misc{kananallmteam2025kananacomputeefficientbilinguallanguage,
      title={Kanana: Compute-efficient Bilingual Language Models}, 
      author={Kanana LLM Team and Yunju Bak and Hojin Lee and Minho Ryu and Jiyeon Ham and Seungjae Jung and Daniel Wontae Nam and Taegyeong Eo and Donghun Lee and Doohae Jung and Boseop Kim and Nayeon Kim and Jaesun Park and Hyunho Kim and Hyunwoong Ko and Changmin Lee and Kyoung-Woon On and Seulye Baeg and Junrae Cho and Sunghee Jung and Jieun Kang and EungGyun Kim and Eunhwa Kim and Byeongil Ko and Daniel Lee and Minchul Lee and Miok Lee and Shinbok Lee and Gaeun Seo},
      year={2025},
      eprint={2502.18934},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.18934}, 
}
```

<br>

## Contact
- Kanana LLM Team Technical Support: [email protected]
- Business & Partnership Contact: [email protected]