wavy-jung commited on
Commit
ad150d6
·
verified ·
1 Parent(s): 7526ffd

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +133 -0
README.md ADDED
@@ -0,0 +1,133 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - ko
5
+ library_name: transformers
6
+ license: unlicense
7
+ pipeline_tag: text-generation
8
+ model_id: kakaocorp/kanana-1.5-2.1b-base
9
+ repo: kakaocorp/kanana-1.5-2.1b-base
10
+ developers: KananaAlpha LLM
11
+ training_regime: bf16 mixed precision
12
+ results: '| mmlu (5-shots) [acc] | kmmlu-direct (5-shots) [exact_match] | haerae (5-shots) [acc_norm] | gsm8k (5-shots) [exact_match_strict] | humaneval (0-shots) [pass@1] | mbpp (3-shots) [pass@1] |
13
+ |------------------------|----------------------------------------|-------------------------------|----------------------------------------|--------------------------------|---------------------------|
14
+ | 56.26 | 45.25 | 76.72 | 53.60 | 53.66 | 53.66 |'
15
+ model_summary: Kanana-1.5-2.1b-base is an auto-regressive language model that
16
+ uses an optimized transformer architecture. Kanana-1.5-2.1b-base uses a tokenizer
17
+ with a vocabulary of 128K tokens, and supports sequence length of 32k.
18
+ Grouped-Query Attention (GQA) is used for all models to improve inference efficiency.
19
+ training_data: Kanana-1.5-2.1b-base was continuously pretrained from kakaocorp/kanana-essence-2.1b-dus-v1.0.0. Neither the pretraining nor the fine-tuning datasets include Kakao user data.
20
+ model-index:
21
+ - name: kanana-1.5-2.1b-base
22
+ results:
23
+ - task:
24
+ type: multiple_choice
25
+ name: mmlu
26
+ dataset:
27
+ name: mmlu (5-shots)
28
+ type: hails/mmlu_no_train
29
+ metrics:
30
+ - type: acc
31
+ value: 56.26
32
+ name: acc
33
+ - task:
34
+ type: generate_until
35
+ name: kmmlu
36
+ dataset:
37
+ name: kmmlu-direct (5-shots)
38
+ type: HAERAE-HUB/KMMLU
39
+ metrics:
40
+ - type: exact_match
41
+ value: 45.25
42
+ name: exact_match
43
+ - task:
44
+ type: multiple_choice
45
+ name: haerae
46
+ dataset:
47
+ name: haerae (5-shots)
48
+ type: HAERAE-HUB/HAE_RAE_BENCH
49
+ metrics:
50
+ - type: acc_norm
51
+ value: 76.72
52
+ name: acc_norm
53
+ - task:
54
+ type: generate_until
55
+ name: gsm8k
56
+ dataset:
57
+ name: gsm8k (5-shots)
58
+ type: openai/gsm8k
59
+ metrics:
60
+ - type: exact_match
61
+ value: 53.60
62
+ name: exact_match_strict
63
+ - task:
64
+ type: generate_until
65
+ name: humaneval
66
+ dataset:
67
+ name: humaneval (0-shots)
68
+ type: openai/openai_humaneval
69
+ metrics:
70
+ - type: pass@1
71
+ value: 53.66
72
+ name: pass@1
73
+ - task:
74
+ type: generate_until
75
+ name: mbpp
76
+ dataset:
77
+ name: mbpp (3-shots)
78
+ type: google-research-datasets/mbpp
79
+ metrics:
80
+ - type: pass@1
81
+ value: 53.66
82
+ name: pass@1
83
+ ---
84
+ # Model Card for kakaocorp/kanana-1.5-2.1b-base
85
+
86
+ <!-- Provide a quick summary of what the model is/does. -->
87
+
88
+ Kanana-1.5-2.1b-base is an auto-regressive language model that uses an optimized transformer architecture. Kanana-1.5-2.1b-base uses a tokenizer with a vocabulary of 128K tokens, and supports sequence length of 32k. Grouped-Query Attention (GQA) is used for all models to improve inference efficiency.
89
+
90
+ ## Model Details
91
+
92
+ ### Model Description
93
+
94
+ <!-- Provide a longer summary of what this model is. -->
95
+
96
+
97
+
98
+ - **Developed by:** KananaAlpha LLM
99
+ - **Language(s) (NLP):** ['en', 'ko']
100
+ - **License:** unlicense
101
+
102
+ ## Training Details
103
+
104
+ ### Training Data
105
+
106
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
107
+
108
+ Kanana-1.5-2.1b-base was continuously pretrained from kakaocorp/kanana-essence-2.1b-v1.0.0. Neither the pretraining nor the fine-tuning datasets include Kakao user data.
109
+
110
+ ### Training Procedure
111
+
112
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
113
+
114
+ #### Training Hyperparameters
115
+
116
+ - **Training regime:** bf16 mixed precision <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
117
+
118
+ ## Evaluation
119
+
120
+ <!-- This section describes the evaluation protocols and provides the results. -->
121
+
122
+ ### Results for General Tasks
123
+
124
+ | mmlu (5-shots) [acc] | kmmlu-direct (5-shots) [exact_match] | haerae (5-shots) [acc_norm] | gsm8k (5-shots) [exact_match_strict] | humaneval (0-shots) [pass@1] | mbpp (3-shots) [pass@1] |
125
+ |------------------------|----------------------------------------|-------------------------------|----------------------------------------|--------------------------------|---------------------------|
126
+ | 56.26 | 45.25 | 76.72 | 53.60 | 53.66 | 53.66 |
127
+
128
+ ### Results for Long-Context Tasks
129
+ | context length | ruler_niah_mk_2 [ruler_recall] | ruler_niah_mk_3 [ruler_recall] | ruler_niah_mv [ruler_recall] | json_kv [substring_exact_match] | niah [avg] | avg |
130
+ |---------------|--------------------------------|--------------------------------|------------------------------|----------------------------------|------------|-------|
131
+ | 8192 | 100.00 | 99.00 | 97.00 | 100.00 | 98.92 | 98.98 |
132
+ | 16384 | 99.00 | 97.00 | 95.75 | 100.00 | 99.21 | 98.19 |
133
+ | 32768 | 95.00 | 95.00 | 86.00 | 100.00 | 99.07 | 95.01 |