--- language: - en - ko library_name: transformers license: other license_name: "kanana" license_link: LICENSE pipeline_tag: text-generation model_id: kakaocorp/kanana-1.5-15.7b-a3b-base repo: kakaocorp/kanana-1.5-15.7b-a3b-base developers: Kanana LLM training_regime: bf16 mixed precision ---

🤗 1.5 HF Models | 📕 Kanana-1.5-15.7B-A3B Blog
## News 🔥 - ✨`2025/07/24`: Published a [blog post](https://tech.kakao.com/posts/716) about `Kanana-1.5-15.7B-A3B` models and released 🤗[HF model weights](https://kko.kakao.com/kananallm). - 📕`2025/05/23`: Published a [blog post](https://tech.kakao.com/posts/707) about `Kanana 1.5` models and released 🤗[HF model weights](https://kko.kakao.com/kananallm). - 📜`2025/02/27`: Released [Technical Report](https://arxiv.org/abs/2502.18934) and 🤗[HF model weights](https://huggingface.co/collections/kakaocorp/kanana-nano-21b-67a326cda1c449c8d4172259). - 📕`2025/01/10`: Published a [blog post](https://tech.kakao.com/posts/682) about the development of `Kanana Nano` model. - 📕`2024/11/14`: Published blog posts ([pre-training](https://tech.kakao.com/posts/661), [post-training](https://tech.kakao.com/posts/662)) about the development of `Kanana` models. - ▶️`2024/11/06`: Published a [presentation video](https://youtu.be/HTBl142x9GI?si=o_we6t9suYK8DfX3) about the development of the `Kanana` models.
## Table of Contents - [Kanana-1.5-15.7B-A3B](#kanana-15-157b-a3b) - [Performance](#performance) - [Base Model Evaluation](#base-model-evaluation) - [Instruct Model Evaluation](#instruct-model-evaluation) - [Contributors](#contributors) - [Citation](#citation) - [Contact](#contact)
# Kanana-1.5-15.7B-A3B Introducing `Kanana-1.5-15.7B-A3B`, the first Mixture-of-Experts (MoE) model in our Kanana family, engineered for exceptional efficiency and powerful performance. `Kanana-1.5-15.7B-A3B`, which has sparse architecture, delivers capabilities comparable to the `Kanana-1.5-8B` dense model while utilizing only 37% of the FLOPS per token, making it a highly inference-efficient and cost-effective solution for real-world applications. Furthermore, `Kanana-1.5-15.7B-A3B` is powered by our newly enhanced post-training strategy, which includes on-policy distillation followed by reinforcement learning. > [!Note] > Neither the pre-training nor the post-training data includes Kakao user data. ## Performance ### Base Model Evaluation

Models MMLU KMMLU HAERAE HumanEval MBPP GSM8K

Kanana-1.5-15.7B-A3B 64.79 51.77 83.23 59.76 60.10 61.18

Kanana-1.5-8B 64.24 48.94 82.77 61.59 57.80 63.53

Kanana-1.5-3B* 59.23 47.30 78.00 46.34 46.80 61.79

### Instruct Model Evaluation

Models MT-Bench KoMT-Bench IFEval HumanEval+ MBPP+ GSM8K (0-shot) MATH MMLU (0-shot, CoT) KMMLU (0-shot, CoT)

Kanana-1.5-15.7B-A3B 7.67 7.24 73.35 79.27 70.37 83.02 66.42 68.55 48.92

Kanana-1.5-8B 7.76 7.63 80.11 76.83 67.99 87.64 67.54 68.82 48.28

Kanana-1.5-3B* 7.01 6.52 70.08 70.73 64.29 80.36 56.70 59.69 37.60

> [!Note] > \* This model is not an open-sourced, just for comparison with Kanana-1.5-15.7B-A3B
### Evaluation Protocol - Base Model Benchmarks - MMLU, KMMLU, HAE-RAE: 5-shot, log-likelihood - HumanEval: 0-shot, pass@1 - MBPP: 3-shot, pass@1 - GSM8K: 5-shot, exact-match (strict-match) - Instruct Model Benchmarks - MT-Bench, KoMT-Bench: 0-shot, gpt-4o-2024-08-06 as judge model - IFEval: 0-shot, mean of strict-prompt-level and strict-instruction-level - HumanEval+, MBPP+: 0-shot, pass@1 - GSM8K, MATH: 0-shot, rule-based verification
## Quickstart ### vLLM - `vllm>=0.8.5` or the latest version is required to run `Kanana` model. #### Example Usage for `Kanana-1.5-15.7B-A3B-Base` ```bash vllm serve $path_to_model \ --served_model_name kanana-1.5-15.7b-a3b-base \ --max-model-len 32768 \ --gpu-memory-utilization 0.9 \ --port 8000 \ --dtype auto \ --disable_cascade_attn curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{ "model": "kanana-1.5-15.7b-a3b-base", "prompt": "Kakao is a leading company in South Korea, and it is known for ", "max_tokens": 32, "top_k": 1 }' # Output: ''' ... "choices":[{"index":0,"text":"1) its innovative technology, 2) its high-quality products, and 3) its strong brand image. The company has a long history of success,"... ... ''' ```
## Contributors - Language Model Training - Yunju Bak, Doohae Jung, Boseop Kim, Nayeon Kim, Hojin Lee, Jaesun Park, Minho Ryu, Jiyeon Ham, Seungjae Jung, Hyunho Kim, Hyunwoong Ko, Changmin Lee, Taegyeong Eo
## Citation ``` @misc{kananallmteam2025kananacomputeefficientbilinguallanguage, title={Kanana: Compute-efficient Bilingual Language Models}, author={Kanana LLM Team and Yunju Bak and Hojin Lee and Minho Ryu and Jiyeon Ham and Seungjae Jung and Daniel Wontae Nam and Taegyeong Eo and Donghun Lee and Doohae Jung and Boseop Kim and Nayeon Kim and Jaesun Park and Hyunho Kim and Hyunwoong Ko and Changmin Lee and Kyoung-Woon On and Seulye Baeg and Junrae Cho and Sunghee Jung and Jieun Kang and EungGyun Kim and Eunhwa Kim and Byeongil Ko and Daniel Lee and Minchul Lee and Miok Lee and Shinbok Lee and Gaeun Seo}, year={2025}, eprint={2502.18934}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2502.18934}, } ```
## Contact - Kanana LLM Team Technical Support: kanana-llm@kakaocorp.com - Business & Partnership Contact: alpha.k@kakaocorp.com

Models	MMLU	KMMLU	HAERAE	HumanEval	MBPP	GSM8K
Kanana-1.5-15.7B-A3B	64.79	51.77	83.23	59.76	60.10	61.18
Kanana-1.5-8B	64.24	48.94	82.77	61.59	57.80	63.53
Kanana-1.5-3B*	59.23	47.30	78.00	46.34	46.80	61.79

Models	MT-Bench	KoMT-Bench	IFEval	HumanEval+	MBPP+	GSM8K (0-shot)	MATH	MMLU (0-shot, CoT)	KMMLU (0-shot, CoT)
Kanana-1.5-15.7B-A3B	7.67	7.24	73.35	79.27	70.37	83.02	66.42	68.55	48.92
Kanana-1.5-8B	7.76	7.63	80.11	76.83	67.99	87.64	67.54	68.82	48.28
Kanana-1.5-3B*	7.01	6.52	70.08	70.73	64.29	80.36	56.70	59.69	37.60