--- language: - en - ko library_name: transformers license: other license_name: "kanana" license_link: LICENSE pipeline_tag: text-generation model_id: kakaocorp/kanana-1.5-15.7b-a3b-instruct repo: kakaocorp/kanana-1.5-15.7b-a3b-instruct developers: Kanana LLM training_regime: bf16 mixed precision ---
🤗 1.5 HF Models   |
  📕 Kanana-1.5-15.7B-A3B Blog  
## News 🔥
- ✨`2025/07/24`: Published a [blog post](https://tech.kakao.com/posts/716) about `Kanana-1.5-15.7B-A3B` models and released 🤗[HF model weights](https://kko.kakao.com/kananallm).
- 📕`2025/05/23`: Published a [blog post](https://tech.kakao.com/posts/707) about `Kanana 1.5` models and released 🤗[HF model weights](https://kko.kakao.com/kananallm).
- 📜`2025/02/27`: Released [Technical Report](https://arxiv.org/abs/2502.18934) and 🤗[HF model weights](https://huggingface.co/collections/kakaocorp/kanana-nano-21b-67a326cda1c449c8d4172259).
- 📕`2025/01/10`: Published a [blog post](https://tech.kakao.com/posts/682) about the development of `Kanana Nano` model.
- 📕`2024/11/14`: Published blog posts ([pre-training](https://tech.kakao.com/posts/661), [post-training](https://tech.kakao.com/posts/662)) about the development of `Kanana` models.
- ▶️`2024/11/06`: Published a [presentation video](https://youtu.be/HTBl142x9GI?si=o_we6t9suYK8DfX3) about the development of the `Kanana` models.
## Table of Contents
- [Kanana-1.5-15.7B-A3B](#kanana-15-157b-a3b)
- [Performance](#performance)
- [Base Model Evaluation](#base-model-evaluation)
- [Instruct Model Evaluation](#instruct-model-evaluation)
- [Contributors](#contributors)
- [Citation](#citation)
- [Contact](#contact)
# Kanana-1.5-15.7B-A3B
Introducing `Kanana-1.5-15.7B-A3B`, the first Mixture-of-Experts (MoE) model in our Kanana family, engineered for exceptional efficiency and powerful performance. `Kanana-1.5-15.7B-A3B`, which has sparse architecture, delivers capabilities comparable to the `Kanana-1.5-8B` dense model while utilizing only 37% of the FLOPS per token, making it a highly inference-efficient and cost-effective solution for real-world applications. Furthermore, `Kanana-1.5-15.7B-A3B` is powered by our newly enhanced post-training strategy, which includes on-policy distillation followed by reinforcement learning.
> [!Note]
> Neither the pre-training nor the post-training data includes Kakao user data.
## Performance
### Base Model Evaluation
Models | MMLU | KMMLU | HAERAE | HumanEval | MBPP | GSM8K |
---|---|---|---|---|---|---|
Kanana-1.5-15.7B-A3B | 64.79 | 51.77 | 83.23 | 59.76 | 60.10 | 61.18 |
Kanana-1.5-8B | 64.24 | 48.94 | 82.77 | 61.59 | 57.80 | 63.53 |
Kanana-1.5-3B* | 59.23 | 47.30 | 78.00 | 46.34 | 46.80 | 61.79 |
Models | MT-Bench | KoMT-Bench | IFEval | HumanEval+ | MBPP+ | GSM8K (0-shot) | MATH | MMLU (0-shot, CoT) | KMMLU (0-shot, CoT) |
---|---|---|---|---|---|---|---|---|---|
Kanana-1.5-15.7B-A3B | 7.67 | 7.24 | 73.35 | 79.27 | 70.37 | 83.02 | 66.42 | 68.55 | 48.92 |
Kanana-1.5-8B | 7.76 | 7.63 | 80.11 | 76.83 | 67.99 | 87.64 | 67.54 | 68.82 | 48.28 |
Kanana-1.5-3B* | 7.01 | 6.52 | 70.08 | 70.73 | 64.29 | 80.36 | 56.70 | 59.69 | 37.60 |