wavy-jung commited on
Commit
c0b13a2
Β·
verified Β·
1 Parent(s): 0ec2cdf

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +40 -52
README.md CHANGED
@@ -10,7 +10,7 @@ repo: kakaocorp/kanana-1.5-2.1b-base
10
  developers: Kanana LLM
11
  training_regime: bf16 mixed precision
12
  ---
13
- # Kanana
14
  <p align="center">
15
  <br>
16
  <picture>
@@ -20,14 +20,14 @@ training_regime: bf16 mixed precision
20
  </picture>
21
  </br>
22
  <p align="center">
23
- πŸ€— <a href="https://kko.kakao.com/kananallm">HF Models</a> &nbsp |
24
- &nbsp πŸ“• <a href="https://tech.kakao.com/posts/707">Blog Post</a> &nbsp |
25
  &nbsp πŸ“œ <a href="https://arxiv.org/abs/2502.18934">Technical Report</a>
26
 
27
 
28
  <br>
29
 
30
- ## πŸ”₯ News
31
 
32
  - ✨`2025/05/23`: Published a [blog post](https://tech.kakao.com/posts/707) about `Kanana 1.5` models and released πŸ€—[HF model weights](https://kko.kakao.com/kananallm).
33
  - πŸ“œ`2025/02/27`: Released [Technical Report](https://arxiv.org/abs/2502.18934) and πŸ€—[HF model weights](https://huggingface.co/collections/kakaocorp/kanana-nano-21b-67a326cda1c449c8d4172259).
@@ -41,19 +41,19 @@ training_regime: bf16 mixed precision
41
 
42
  - [Kanana 1.5](#kanana-15)
43
  - [Performance](#performance)
44
- - [Kanana 1.5 Base Models](#kanana-15-base-models)
45
- - [Kanana 1.5 Instruct Models](#kanana-15-instruct-models)
46
  - [Long Context](#long-context)
47
  - [Processing 32K+ Length](#processing-32k-length)
 
48
  - [Kanana 1.0](#kanana-10)
49
- - [License](#license)
50
  - [Citation](#citation)
51
- - [Contributors](#contributors)
52
  - [Contact](#contact)
53
 
54
  <br>
55
 
56
- ## Kanana 1.5
 
57
  `Kanana 1.5`, a newly introduced version of the Kanana model family, presents substantial enhancements in **coding, mathematics, and function calling capabilities** over the previous version, enabling broader application to more complex real-world problems. This new version now can handle __up to 32K tokens length natively and up to 128K tokens using YaRN__, allowing the model to maintain coherence when handling extensive documents or engaging in extended conversations. Furthermore, Kanana 1.5 delivers more natural and accurate conversations through a __refined post-training process__.
58
 
59
  <p align="center">
@@ -64,11 +64,9 @@ training_regime: bf16 mixed precision
64
  > [!Note]
65
  > Neither the pre-training nor the post-training data includes Kakao user data.
66
 
67
- <br>
68
-
69
- ### Performance
70
 
71
- #### Kanana 1.5 Base Models
72
  <table>
73
  <tr>
74
  <th>Models</th>
@@ -162,8 +160,9 @@ training_regime: bf16 mixed precision
162
  </tr>
163
  </table>
164
 
 
165
 
166
- #### Kanana 1.5 Instruct Models
167
  <table>
168
  <tr>
169
  <th>Models</th>
@@ -299,8 +298,10 @@ training_regime: bf16 mixed precision
299
 
300
  > \* Models released under the Apache 2.0 license have been trained on more recent data compared to other models.
301
 
302
- #### Long Context
303
- ##### Kanana-1.5-32.5B-Base
 
 
304
  Below is a Needle-in-a-Haystack performance of `Kanana-1.5-32.5B-Base` model which was trained on a target context length of 32K.
305
  - (left): evaluated with native 32K context length
306
  - (right): extended to 128K context length using YaRN
@@ -310,7 +311,7 @@ Below is a Needle-in-a-Haystack performance of `Kanana-1.5-32.5B-Base` model whi
310
  <img src="assets/performance/niah-32.5b-base.png", width="1000" style="margin: 40px auto;">
311
  </picture>
312
 
313
- ##### Kanana-1.5-32.5B-Instruct
314
  Below is a Needle-in-a-Haystack performance of `Kanana-1.5-32.5B-Instruct` model which was trained on a target context length of 32K.
315
  - (left): evaluated with native 32K context length
316
  - (right): extended to 128K context length using YaRN
@@ -322,7 +323,7 @@ Below is a Needle-in-a-Haystack performance of `Kanana-1.5-32.5B-Instruct` model
322
 
323
  <br>
324
 
325
- #### Processing 32K+ Length
326
  Currently, the `config.json` uploaded to HuggingFace is configured for token lengths of 32,768 or less. To process tokens beyond this length, YaRN must be applied. By updating the `config.json` with the following parameters, you can apply YaRN to handle token sequences up to 128K in length:
327
  ```json
328
  "rope_scaling": {
@@ -336,13 +337,19 @@ Currently, the `config.json` uploaded to HuggingFace is configured for token len
336
 
337
  <br>
338
 
 
 
 
 
339
 
340
- ## Kanana 1.0
341
-
342
- ### Kanana 1.0 Introduction
343
 
 
344
  <details>
345
  <summary>View the details about Kanana 1.0</summary>
 
 
 
346
  We introduce Kanana, a series of bilingual language models (developed by [Kakao](https://github.com/kakao)) that demonstrate exceeding performance in Korean and competitive performance in English. The computational cost of Kanana is significantly lower than that of state-of-the-art models of similar size. The report details the techniques employed during pre-training to achieve compute-efficient yet competitive models, including high-quality data filtering, staged pre-training, depth up-scaling, and pruning and distillation. Furthermore, the report outlines the methodologies utilized during the post-training of the Kanana models, encompassing supervised fine-tuning and preference optimization, aimed at enhancing their capability for seamless interaction with users. Lastly, the report elaborates on plausible approaches used for language model adaptation to specific scenarios, such as embedding, function calling, and Retrieval Augmented Generation (RAG). The Kanana model series spans from 2.1B to 32.5B parameters with 2.1B models (base, instruct, embedding, function call, and RAG) publicly released to promote research on Korean language models.
347
 
348
  > Neither the pre-training nor the post-training data includes Kakao user data.
@@ -352,16 +359,11 @@ We introduce Kanana, a series of bilingual language models (developed by [Kakao]
352
  <img src="assets/performance/flops-vs-mmlu.jpg", width="700" style="margin: 40px auto;">
353
  </picture>
354
 
355
- </details>
356
-
357
-
358
- ### Kanana 1.0 Models Performance
359
- <details>
360
- <summary>View detailed performance of Kanana 1.0 models</summary>
361
 
362
  Below are partial report on the performance of the `Kanana` model series. Please refer to the [Technical Report](https://arxiv.org/abs/2502.18934) for the full results.
363
 
364
- #### Pre-trained Model Performance
365
 
366
  <table>
367
  <tr>
@@ -552,9 +554,9 @@ Below are partial report on the performance of the `Kanana` model series. Please
552
  <br>
553
 
554
 
555
- #### Post-trained Model Performance
556
 
557
- ##### Instruction-following Benchmarks
558
  <table>
559
  <tr>
560
  <th>Models</th>
@@ -724,7 +726,7 @@ Below are partial report on the performance of the `Kanana` model series. Please
724
 
725
  <br>
726
 
727
- ##### General Benchmarks
728
 
729
  <table>
730
  <tr>
@@ -933,7 +935,7 @@ Below are partial report on the performance of the `Kanana` model series. Please
933
 
934
  <br>
935
 
936
- #### Embedding Model Performance
937
  <table>
938
  <tr>
939
  <td align="center">Backbone</td>
@@ -969,13 +971,15 @@ Below are partial report on the performance of the `Kanana` model series. Please
969
  </tr>
970
  </table>
971
 
972
- </details>
973
 
 
974
 
975
- <br>
 
 
976
 
977
- ## License
978
- The `Kanana 1.5` models are licensed under [apache-2.0](./LICENSE).
979
 
980
  <br>
981
 
@@ -995,22 +999,6 @@ The `Kanana 1.5` models are licensed under [apache-2.0](./LICENSE).
995
 
996
  <br>
997
 
998
- ## Contributors
999
- - Language Model Training: Yunju Bak, Doohae Jung, Boseop Kim, Nayeon Kim, Hojin Lee, Jaesun Park, Minho Ryu
1000
- - Language Model Alignment: Jiyeon Ham, Seungjae Jung, Hyunho Kim, Hyunwoong Ko, Changmin Lee, Daniel Wontae Nam
1001
- - AI Engineering: Youmin Kim, Hyeongju Kim
1002
-
1003
- <details>
1004
- <summary>Contributors for Kanana 1.0</summary>
1005
-
1006
- - Pre-training: Yunju Bak, Doohae Jung, Boseop Kim, Nayeon Kim, Hojin Lee, Jaesun Park, Minho Ryu
1007
- - Post-training: Jiyeon Ham, Seungjae Jung, Hyunho Kim, Hyunwoong Ko, Changmin Lee, Daniel Wontae Nam, Kyoung-Woon On
1008
- - Adaptation: Seulye Baeg, Junrae Cho, Taegyeong Eo, Sunghee Jung, Jieun Kang, EungGyun Kim, Eunhwa Kim, Byeongil Ko, Daniel Lee, Donghun Lee, Minchul Lee, Miok Lee, Shinbok Lee, Minho Ryu, Gaeun Seo
1009
-
1010
- </details>
1011
-
1012
- <br>
1013
-
1014
  ## Contact
1015
  - Kanana LLM Team Technical Support: [email protected]
1016
  - Business & Partnership Contact: [email protected]
 
10
  developers: Kanana LLM
11
  training_regime: bf16 mixed precision
12
  ---
13
+
14
  <p align="center">
15
  <br>
16
  <picture>
 
20
  </picture>
21
  </br>
22
  <p align="center">
23
+ πŸ€— <a href="https://kko.kakao.com/kananallm">1.5 HF Models</a> &nbsp |
24
+ &nbsp πŸ“• <a href="https://tech.kakao.com/posts/707">1.5 Blog Post</a> &nbsp |
25
  &nbsp πŸ“œ <a href="https://arxiv.org/abs/2502.18934">Technical Report</a>
26
 
27
 
28
  <br>
29
 
30
+ ## News πŸ”₯
31
 
32
  - ✨`2025/05/23`: Published a [blog post](https://tech.kakao.com/posts/707) about `Kanana 1.5` models and released πŸ€—[HF model weights](https://kko.kakao.com/kananallm).
33
  - πŸ“œ`2025/02/27`: Released [Technical Report](https://arxiv.org/abs/2502.18934) and πŸ€—[HF model weights](https://huggingface.co/collections/kakaocorp/kanana-nano-21b-67a326cda1c449c8d4172259).
 
41
 
42
  - [Kanana 1.5](#kanana-15)
43
  - [Performance](#performance)
44
+ - [Base Model Evaluation](#base-model-evaluation)
45
+ - [Instruct Model Evaluation](#instruct-model-evaluation)
46
  - [Long Context](#long-context)
47
  - [Processing 32K+ Length](#processing-32k-length)
48
+ - [Contributors](#contributors)
49
  - [Kanana 1.0](#kanana-10)
 
50
  - [Citation](#citation)
 
51
  - [Contact](#contact)
52
 
53
  <br>
54
 
55
+ # Kanana 1.5
56
+
57
  `Kanana 1.5`, a newly introduced version of the Kanana model family, presents substantial enhancements in **coding, mathematics, and function calling capabilities** over the previous version, enabling broader application to more complex real-world problems. This new version now can handle __up to 32K tokens length natively and up to 128K tokens using YaRN__, allowing the model to maintain coherence when handling extensive documents or engaging in extended conversations. Furthermore, Kanana 1.5 delivers more natural and accurate conversations through a __refined post-training process__.
58
 
59
  <p align="center">
 
64
  > [!Note]
65
  > Neither the pre-training nor the post-training data includes Kakao user data.
66
 
67
+ ## Performance
 
 
68
 
69
+ ### Base Model Evaluation
70
  <table>
71
  <tr>
72
  <th>Models</th>
 
160
  </tr>
161
  </table>
162
 
163
+ <br>
164
 
165
+ ### Instruct Model Evaluation
166
  <table>
167
  <tr>
168
  <th>Models</th>
 
298
 
299
  > \* Models released under the Apache 2.0 license have been trained on more recent data compared to other models.
300
 
301
+ <br>
302
+
303
+ ### Long Context
304
+ #### Kanana-1.5-32.5B-Base
305
  Below is a Needle-in-a-Haystack performance of `Kanana-1.5-32.5B-Base` model which was trained on a target context length of 32K.
306
  - (left): evaluated with native 32K context length
307
  - (right): extended to 128K context length using YaRN
 
311
  <img src="assets/performance/niah-32.5b-base.png", width="1000" style="margin: 40px auto;">
312
  </picture>
313
 
314
+ #### Kanana-1.5-32.5B-Instruct
315
  Below is a Needle-in-a-Haystack performance of `Kanana-1.5-32.5B-Instruct` model which was trained on a target context length of 32K.
316
  - (left): evaluated with native 32K context length
317
  - (right): extended to 128K context length using YaRN
 
323
 
324
  <br>
325
 
326
+ ## Processing 32K+ Length
327
  Currently, the `config.json` uploaded to HuggingFace is configured for token lengths of 32,768 or less. To process tokens beyond this length, YaRN must be applied. By updating the `config.json` with the following parameters, you can apply YaRN to handle token sequences up to 128K in length:
328
  ```json
329
  "rope_scaling": {
 
337
 
338
  <br>
339
 
340
+ ## Contributors
341
+ - Language Model Training: Yunju Bak, Doohae Jung, Boseop Kim, Nayeon Kim, Hojin Lee, Jaesun Park, Minho Ryu
342
+ - Language Model Alignment: Jiyeon Ham, Seungjae Jung, Hyunho Kim, Hyunwoong Ko, Changmin Lee, Daniel Wontae Nam
343
+ - AI Engineering: Youmin Kim, Hyeongju Kim
344
 
345
+ <br>
 
 
346
 
347
+ # Kanana 1.0
348
  <details>
349
  <summary>View the details about Kanana 1.0</summary>
350
+
351
+ <br>
352
+
353
  We introduce Kanana, a series of bilingual language models (developed by [Kakao](https://github.com/kakao)) that demonstrate exceeding performance in Korean and competitive performance in English. The computational cost of Kanana is significantly lower than that of state-of-the-art models of similar size. The report details the techniques employed during pre-training to achieve compute-efficient yet competitive models, including high-quality data filtering, staged pre-training, depth up-scaling, and pruning and distillation. Furthermore, the report outlines the methodologies utilized during the post-training of the Kanana models, encompassing supervised fine-tuning and preference optimization, aimed at enhancing their capability for seamless interaction with users. Lastly, the report elaborates on plausible approaches used for language model adaptation to specific scenarios, such as embedding, function calling, and Retrieval Augmented Generation (RAG). The Kanana model series spans from 2.1B to 32.5B parameters with 2.1B models (base, instruct, embedding, function call, and RAG) publicly released to promote research on Korean language models.
354
 
355
  > Neither the pre-training nor the post-training data includes Kakao user data.
 
359
  <img src="assets/performance/flops-vs-mmlu.jpg", width="700" style="margin: 40px auto;">
360
  </picture>
361
 
362
+ ## Performance
 
 
 
 
 
363
 
364
  Below are partial report on the performance of the `Kanana` model series. Please refer to the [Technical Report](https://arxiv.org/abs/2502.18934) for the full results.
365
 
366
+ ### Base Model Evaluation
367
 
368
  <table>
369
  <tr>
 
554
  <br>
555
 
556
 
557
+ ### Instruct Model Evaluation
558
 
559
+ #### Instruction-following Benchmarks
560
  <table>
561
  <tr>
562
  <th>Models</th>
 
726
 
727
  <br>
728
 
729
+ #### General Benchmarks
730
 
731
  <table>
732
  <tr>
 
935
 
936
  <br>
937
 
938
+ ### Embedding Model Performance
939
  <table>
940
  <tr>
941
  <td align="center">Backbone</td>
 
971
  </tr>
972
  </table>
973
 
974
+ <br>
975
 
976
+ ## Contributors
977
 
978
+ - Pre-training: Yunju Bak, Doohae Jung, Boseop Kim, Nayeon Kim, Hojin Lee, Jaesun Park, Minho Ryu
979
+ - Post-training: Jiyeon Ham, Seungjae Jung, Hyunho Kim, Hyunwoong Ko, Changmin Lee, Daniel Wontae Nam, Kyoung-Woon On
980
+ - Adaptation: Seulye Baeg, Junrae Cho, Taegyeong Eo, Sunghee Jung, Jieun Kang, EungGyun Kim, Eunhwa Kim, Byeongil Ko, Daniel Lee, Donghun Lee, Minchul Lee, Miok Lee, Shinbok Lee, Minho Ryu, Gaeun Seo
981
 
982
+ </details>
 
983
 
984
  <br>
985
 
 
999
 
1000
  <br>
1001
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1002
  ## Contact
1003
  - Kanana LLM Team Technical Support: [email protected]
1004
  - Business & Partnership Contact: [email protected]