kakaocorp
/

kanana-1.5-2.1b-base

+---
+language:
+- en
+- ko
+library_name: transformers
+license: unlicense
+pipeline_tag: text-generation
+model_id: kakaocorp/kanana-1.5-2.1b-base
+repo: kakaocorp/kanana-1.5-2.1b-base
+developers: KananaAlpha LLM
+training_regime: bf16 mixed precision
+results: '|   mmlu (5-shots) [acc] |   kmmlu-direct (5-shots) [exact_match] |   haerae (5-shots) [acc_norm] |   gsm8k (5-shots) [exact_match_strict] |   humaneval (0-shots) [pass@1] |   mbpp (3-shots) [pass@1] |
+|------------------------|----------------------------------------|-------------------------------|----------------------------------------|--------------------------------|---------------------------|
+|                  56.26 |                                  45.25 |                         76.72 |                                  53.60 |                          53.66 |                      53.66 |'
+model_summary: Kanana-1.5-2.1b-base is an auto-regressive language model that
+  uses an optimized transformer architecture. Kanana-1.5-2.1b-base uses a tokenizer
+  with a vocabulary of 128K tokens, and supports sequence length of 32k.
+  Grouped-Query Attention (GQA) is used for all models to improve inference efficiency.
+training_data: Kanana-1.5-2.1b-base was continuously pretrained from kakaocorp/kanana-essence-2.1b-dus-v1.0.0. Neither the pretraining nor the fine-tuning datasets include Kakao user data.
+model-index:
+- name: kanana-1.5-2.1b-base
+  results:
+  - task:
+      type: multiple_choice
+      name: mmlu
+    dataset:
+      name: mmlu (5-shots)
+      type: hails/mmlu_no_train
+    metrics:
+    - type: acc
+      value: 56.26
+      name: acc
+  - task:
+      type: generate_until
+      name: kmmlu
+    dataset:
+      name: kmmlu-direct (5-shots)
+      type: HAERAE-HUB/KMMLU
+    metrics:
+    - type: exact_match
+      value: 45.25
+      name: exact_match
+  - task:
+      type: multiple_choice
+      name: haerae
+    dataset:
+      name: haerae (5-shots)
+      type: HAERAE-HUB/HAE_RAE_BENCH
+    metrics:
+    - type: acc_norm
+      value: 76.72
+      name: acc_norm
+  - task:
+      type: generate_until
+      name: gsm8k
+    dataset:
+      name: gsm8k (5-shots)
+      type: openai/gsm8k
+    metrics:
+    - type: exact_match
+      value: 53.60
+      name: exact_match_strict
+  - task:
+      type: generate_until
+      name: humaneval
+    dataset:
+      name: humaneval (0-shots)
+      type: openai/openai_humaneval
+    metrics:
+    - type: pass@1
+      value: 53.66
+      name: pass@1
+  - task:
+      type: generate_until
+      name: mbpp
+    dataset:
+      name: mbpp (3-shots)
+      type: google-research-datasets/mbpp
+    metrics:
+    - type: pass@1
+      value: 53.66
+      name: pass@1
+---
+# Model Card for kakaocorp/kanana-1.5-2.1b-base
+<!-- Provide a quick summary of what the model is/does. -->
+Kanana-1.5-2.1b-base is an auto-regressive language model that uses an optimized transformer architecture. Kanana-1.5-2.1b-base uses a tokenizer with a vocabulary of 128K tokens, and supports sequence length of 32k. Grouped-Query Attention (GQA) is used for all models to improve inference efficiency.
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** KananaAlpha LLM
+- **Language(s) (NLP):** ['en', 'ko']
+- **License:** unlicense
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+Kanana-1.5-2.1b-base was continuously pretrained from kakaocorp/kanana-essence-2.1b-v1.0.0. Neither the pretraining nor the fine-tuning datasets include Kakao user data.
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Training Hyperparameters
+- **Training regime:** bf16 mixed precision <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Results for General Tasks
+|   mmlu (5-shots) [acc] |   kmmlu-direct (5-shots) [exact_match] |   haerae (5-shots) [acc_norm] |   gsm8k (5-shots) [exact_match_strict] |   humaneval (0-shots) [pass@1] |   mbpp (3-shots) [pass@1] |
+|------------------------|----------------------------------------|-------------------------------|----------------------------------------|--------------------------------|---------------------------|
+|                  56.26 |                                  45.25 |                         76.72 |                                  53.60 |                          53.66 |                      53.66 |
+### Results for Long-Context Tasks
+| context length | ruler_niah_mk_2 [ruler_recall] | ruler_niah_mk_3 [ruler_recall] | ruler_niah_mv [ruler_recall] | json_kv [substring_exact_match] | niah [avg] |  avg  |
+|---------------|--------------------------------|--------------------------------|------------------------------|----------------------------------|------------|-------|
+| 8192          | 100.00                         | 99.00                         | 97.00                        | 100.00                           | 98.92     | 98.98 |
+| 16384         | 99.00                          | 97.00                         | 95.75                        | 100.00                           | 99.21      | 98.19 |
+| 32768         | 95.00                          | 95.00                          | 86.00                        | 100.00                            | 99.07      | 95.01 |