amezasor commited on
Commit
45df10c
·
verified ·
1 Parent(s): 4afa1e1

eval results

Browse files
Files changed (1) hide show
  1. README.md +228 -17
README.md CHANGED
@@ -56,27 +56,238 @@ output = tokenizer.batch_decode(output)
56
  # print output
57
  print(output)
58
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
 
60
  **Model Architecture:**
61
  Granite-3.1-2B-Base is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings.
62
 
63
- | Model | 2B Dense | 8B Dense | 1B MoE | 3B MoE |
64
- | :-------- | :--------| :-------- | :------| :------|
65
- | Embedding size | **2048** | 4096 | 1024 | 1536 |
66
- | Number of layers | **40** | 40 | 24 | 32 |
67
- | Attention head size | **64** | 128 | 64 | 64 |
68
- | Number of attention heads | **32** | 32 | 16 | 24 |
69
- | Number of KV heads | **8** | 8 | 8 | 8 |
70
- | MLP hidden size | **8192** | 12800 | 512 | 512 |
71
- | MLP activation | **SwiGLU** | SwiGLU | SwiGLU | SwiGLU |
72
- | Number of experts | **—** | — | 32 | 40 |
73
- | MoE TopK | **—** | — | 8 | 8 |
74
- | Initialization std | **0.1** | 0.1 | 0.1 | 0.1 |
75
- | Sequence length | **128K** | 128K | 128K | 128K |
76
- | Position embedding | **RoPE** | RoPE | RoPE | RoPE |
77
- | # Parameters | **2.5B** | 8.1B | 1.3B | 3.3B |
78
- | # Active parameters | **2.5B** | 8.1B | 400M | 800M |
79
- | # Training tokens | **12T** | 12T | 10T | 10T |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
 
81
  **Training Data:**
82
  This model is trained on a mix of open source and proprietary data following a three-stage training strategy.
 
56
  # print output
57
  print(output)
58
  ```
59
+ **Evaluation Results:**
60
+ <table>
61
+ <caption><b>HuggingFace Open LLM Leaderboard V1</b></caption>
62
+ <thead>
63
+ <tr>
64
+ <th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
65
+ <th style="text-align:center; background-color: #001d6c; color: white;">ARC-Challenge</th>
66
+ <th style="text-align:center; background-color: #001d6c; color: white;">Hellaswag</th>
67
+ <th style="text-align:center; background-color: #001d6c; color: white;">MMLU</th>
68
+ <th style="text-align:center; background-color: #001d6c; color: white;">TruthfulQA</th>
69
+ <th style="text-align:center; background-color: #001d6c; color: white;">Winogrande</th>
70
+ <th style="text-align:center; background-color: #001d6c; color: white;">GSM8K</th>
71
+ <th style="text-align:center; background-color: #001d6c; color: white;">Avg</th>
72
+ </tr></thead>
73
+ <tbody>
74
+ <tr>
75
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Granite-3.1-8B-Base</td>
76
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">63.99</td>
77
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">83.27</td>
78
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">63.45</td>
79
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">51.29</td>
80
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">78.92</td>
81
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">60.19</td>
82
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">66.85</td>
83
+ </tr>
84
+ <tr>
85
+ <td style="text-align:left; background-color: #DAE8FF; color: #2D2D2D;">Granite-3.1-2B-Base</td>
86
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">53.58</td>
87
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">77.67</td>
88
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">52.86</td>
89
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">39.02</td>
90
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">72.84</td>
91
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">47.99</td>
92
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">57.32</td>
93
+ </tr>
94
+ <tr>
95
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.1-3B-A800M-Base</td>
96
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">50.76</td>
97
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">74.45</td>
98
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">48.31</td>
99
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">39.91</td>
100
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">69.29</td>
101
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">40.56</td>
102
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">53.88</td>
103
+ </tr>
104
+ <tr>
105
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.1-3B-A400M-Base</td>
106
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">39.42</td>
107
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">66.13</td>
108
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">26.53</td>
109
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">37.67</td>
110
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">2.03</td>
111
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">18.87</td>
112
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">31.78</td>
113
+ </tr>
114
+ </tbody></table>
115
+
116
+ <table>
117
+ <caption><b>HuggingFace Open LLM Leaderboard V2</b></caption>
118
+ <thead>
119
+ <tr>
120
+ <th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
121
+ <th style="text-align:center; background-color: #001d6c; color: white;">IFEval</th>
122
+ <th style="text-align:center; background-color: #001d6c; color: white;">BBH</th>
123
+ <th style="text-align:center; background-color: #001d6c; color: white;">MATH Lvl 5</th>
124
+ <th style="text-align:center; background-color: #001d6c; color: white;">GPQA</th>
125
+ <th style="text-align:center; background-color: #001d6c; color: white;">MUSR</th>
126
+ <th style="text-align:center; background-color: #001d6c; color: white;">MMLU-Pro</th>
127
+ <th style="text-align:center; background-color: #001d6c; color: white;">Avg</th>
128
+ </tr></thead>
129
+ <tbody>
130
+ <tr>
131
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Granite-3.1-8B-Base</td>
132
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">42.21</td>
133
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">26.02</td>
134
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">9.52</td>
135
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">9.51</td>
136
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">8.36</td>
137
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">24.8</td>
138
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">20.07</td>
139
+ </tr>
140
+ <tr>
141
+ <td style="text-align:left; background-color: #DAE8FF; color: #2D2D2D;">Granite-3.1-2B-Base</td>
142
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">35.22</td>
143
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">16.84</td>
144
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">5.59</td>
145
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">3.69</td>
146
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">3.9</td>
147
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">13.9</td>
148
+ <td style="text-align:center; background-color: #DAE8FF; color: #2D2D2D;">13.19</td>
149
+ </tr>
150
+ <tr>
151
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.1-3B-A800M-Base</td>
152
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">29.96</td>
153
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">11.91</td>
154
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">4</td>
155
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">3.69</td>
156
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">1.11</td>
157
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">8.81</td>
158
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">9.91</td>
159
+ </tr>
160
+ <tr>
161
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.1-3B-A400M-Base</td>
162
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">25.19</td>
163
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">6.43</td>
164
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">2.19</td>
165
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">0.22</td>
166
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">1.76</td>
167
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">1.55</td>
168
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">6.22</td>
169
+ </tr>
170
+ </tbody></table>
171
 
172
  **Model Architecture:**
173
  Granite-3.1-2B-Base is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings.
174
 
175
+ <table>
176
+ <thead>
177
+ <tr>
178
+ <th style="text-align:left; background-color: #001d6c; color: white;">Model</th>
179
+ <th style="text-align:center; background-color: #001d6c; color: white;">2B Dense</th>
180
+ <th style="text-align:center; background-color: #001d6c; color: white;">8B Dense</th>
181
+ <th style="text-align:center; background-color: #001d6c; color: white;">1B MoE</th>
182
+ <th style="text-align:center; background-color: #001d6c; color: white;">3B MoE</th>
183
+ </tr></thead>
184
+ <tbody>
185
+ <tr>
186
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Embedding size</td>
187
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">2048</td>
188
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">4096</td>
189
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">1024</td>
190
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">1536</td>
191
+ </tr>
192
+ <tr>
193
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Number of layers</td>
194
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">40</td>
195
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">40</td>
196
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">24</td>
197
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">32</td>
198
+ </tr>
199
+ <tr>
200
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Attention head size</td>
201
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">64</td>
202
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">128</td>
203
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">64</td>
204
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">64</td>
205
+ </tr>
206
+ <tr>
207
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Number of attention heads</td>
208
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">32</td>
209
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">32</td>
210
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">16</td>
211
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">24</td>
212
+ </tr>
213
+ <tr>
214
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Number of KV heads</td>
215
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">8</td>
216
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">8</td>
217
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">8</td>
218
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">8</td>
219
+ </tr>
220
+ <tr>
221
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">MLP hidden size</td>
222
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">8192</td>
223
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">12800</td>
224
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">512</td>
225
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">512</td>
226
+ </tr>
227
+ <tr>
228
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">MLP activation</td>
229
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">SwiGLU</td>
230
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">SwiGLU</td>
231
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">SwiGLU</td>
232
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">SwiGLU</td>
233
+ </tr>
234
+ <tr>
235
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Number of experts</td>
236
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">—</td>
237
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">—</td>
238
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">32</td>
239
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">40</td>
240
+ </tr>
241
+ <tr>
242
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">MoE TopK</td>
243
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">—</td>
244
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">—</td>
245
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">8</td>
246
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">8</td>
247
+ </tr>
248
+ <tr>
249
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Initialization std</td>
250
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">0.1</td>
251
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">0.1</td>
252
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">0.1</td>
253
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">0.1</td>
254
+ </tr>
255
+ <tr>
256
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Sequence length</td>
257
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">128K</td>
258
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">128K</td>
259
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">128K</td>
260
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">128K</td>
261
+ </tr>
262
+ <tr>
263
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Position embedding</td>
264
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">RoPE</td>
265
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">RoPE</td>
266
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">RoPE</td>
267
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">RoPE</td>
268
+ </tr>
269
+ <tr>
270
+ <td style="text-align:left; background-color: #FFFFFF; color: black;"># Parameters</td>
271
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">2.5B</td>
272
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">8.1B</td>
273
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">1.3B</td>
274
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">3.3B</td>
275
+ </tr>
276
+ <tr>
277
+ <td style="text-align:left; background-color: #FFFFFF; color: black;"># Active parameters</td>
278
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">2.5B</td>
279
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">8.1B</td>
280
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">400M</td>
281
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">800M</td>
282
+ </tr>
283
+ <tr>
284
+ <td style="text-align:left; background-color: #FFFFFF; color: black;"># Training tokens</td>
285
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">12T</td>
286
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">12T</td>
287
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">10T</td>
288
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">10T</td>
289
+ </tr>
290
+ </tbody></table>
291
 
292
  **Training Data:**
293
  This model is trained on a mix of open source and proprietary data following a three-stage training strategy.