amezasor commited on
Commit
e809e8f
·
verified ·
1 Parent(s): 7f4bb8b

evaluation results

Browse files
Files changed (1) hide show
  1. README.md +228 -17
README.md CHANGED
@@ -76,27 +76,238 @@ output = tokenizer.batch_decode(output)
76
  # print output
77
  print(output)
78
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
 
80
  **Model Architecture:**
81
  Granite-3.1-8B-Instruct is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings.
82
 
83
- | Model | 2B Dense | 8B Dense | 1B MoE | 3B MoE |
84
- | :-------- | :--------| :-------- | :------| :------|
85
- | Embedding size | 2048 | **4096** | 1024 | 1536 |
86
- | Number of layers | 40 | **40** | 24 | 32 |
87
- | Attention head size | 64 | **128** | 64 | 64 |
88
- | Number of attention heads | 32 | **32** | 16 | 24 |
89
- | Number of KV heads | 8 | **8** | 8 | 8 |
90
- | MLP hidden size | 8192 | **12800** | 512 | 512 |
91
- | MLP activation | SwiGLU | **SwiGLU** | SwiGLU | SwiGLU |
92
- | Number of experts | — | **—** | 32 | 40 |
93
- | MoE TopK | — | **—** | 8 | 8 |
94
- | Initialization std | 0.1 | **0.1** | 0.1 | 0.1 |
95
- | Sequence length | 128K | **128K** | 128K | 128K |
96
- | Position embedding | RoPE | **RoPE** | RoPE | RoPE |
97
- | # Parameters | 2.5B | **8.1B** | 1.3B | 3.3B |
98
- | # Active parameters | 2.5B | **8.1B** | 400M | 800M |
99
- | # Training tokens | 12T | **12T** | 10T | 10T |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
100
 
101
  **Training Data:**
102
  Overall, our SFT data is largely comprised of three key sources: (1) publicly available datasets with permissive license, (2) internal synthetic data targeting specific capabilities including long-context tasks, and (3) very small amounts of human-curated data. A detailed attribution of datasets can be found in the [Granite 3.0 Technical Report](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf), [Granite 3.1 Technical Report (coming soon)](https://huggingface.co/collections/ibm-granite/granite-31-language-models-6751dbbf2f3389bec5c6f02d), and [Accompanying Author List](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/author-ack.pdf).
 
76
  # print output
77
  print(output)
78
  ```
79
+ **Evaluation Results:**
80
+ <table>
81
+ <caption><b>HuggingFace Open LLM Leaderboard V1</b></caption>
82
+ <thead>
83
+ <tr>
84
+ <th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
85
+ <th style="text-align:center; background-color: #001d6c; color: white;">ARC-Challenge</th>
86
+ <th style="text-align:center; background-color: #001d6c; color: white;">Hellaswag</th>
87
+ <th style="text-align:center; background-color: #001d6c; color: white;">MMLU</th>
88
+ <th style="text-align:center; background-color: #001d6c; color: white;">TruthfulQA</th>
89
+ <th style="text-align:center; background-color: #001d6c; color: white;">Winogrande</th>
90
+ <th style="text-align:center; background-color: #001d6c; color: white;">GSM8K</th>
91
+ <th style="text-align:center; background-color: #001d6c; color: white;">Avg</th>
92
+ </tr></thead>
93
+ <tbody>
94
+ <tr>
95
+ <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-8B-Instruct</td>
96
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">62.62</td>
97
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">84.48</td>
98
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">65.34</td>
99
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">66.23</td>
100
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">75.37</td>
101
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">73.84</td>
102
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">71.31</td>
103
+ </tr>
104
+ <tr>
105
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.1-2B-Instruct</td>
106
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">54.61</td>
107
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">75.14</td>
108
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">55.31</td>
109
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">59.42</td>
110
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">67.48</td>
111
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">52.76</td>
112
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">60.79</td>
113
+ </tr>
114
+ <tr>
115
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.1-3B-A800M-Instruct</td>
116
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">50.42</td>
117
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">73.01</td>
118
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">52.19</td>
119
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">49.71</td>
120
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">64.87</td>
121
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">48.97</td>
122
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">56.53</td>
123
+ </tr>
124
+ <tr>
125
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.1-1B-A400M-Instruct</td>
126
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">42.66</td>
127
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">65.97</td>
128
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">26.13</td>
129
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">46.77</td>
130
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">62.35</td>
131
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">33.88</td>
132
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">46.29</td>
133
+ </tr>
134
+ </tbody></table>
135
+
136
+ <table>
137
+ <caption><b>HuggingFace Open LLM Leaderboard V2</b></caption>
138
+ <thead>
139
+ <tr>
140
+ <th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
141
+ <th style="text-align:center; background-color: #001d6c; color: white;">IFEval</th>
142
+ <th style="text-align:center; background-color: #001d6c; color: white;">BBH</th>
143
+ <th style="text-align:center; background-color: #001d6c; color: white;">MATH Lvl 5</th>
144
+ <th style="text-align:center; background-color: #001d6c; color: white;">GPQA</th>
145
+ <th style="text-align:center; background-color: #001d6c; color: white;">MUSR</th>
146
+ <th style="text-align:center; background-color: #001d6c; color: white;">MMLU-Pro</th>
147
+ <th style="text-align:center; background-color: #001d6c; color: white;">Avg</th>
148
+ </tr></thead>
149
+ <tbody>
150
+ <tr>
151
+ <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-8B-Instruct</td>
152
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">72.08</td>
153
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">34.09</td>
154
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">21.68</td>
155
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">8.28</td>
156
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">19.01</td>
157
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">28.19</td>
158
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">30.55</td>
159
+ </tr>
160
+ <tr>
161
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.1-2B-Instruct</td>
162
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">62.86</td>
163
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">21.82</td>
164
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">11.33</td>
165
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">5.26</td>
166
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">4.87</td>
167
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">20.21</td>
168
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">21.06</td>
169
+ </tr>
170
+ <tr>
171
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.1-3B-A800M-Instruct</td>
172
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">55.16</td>
173
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">16.69</td>
174
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">10.35</td>
175
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">5.15</td>
176
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">2.51</td>
177
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">12.75</td>
178
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"></td>
179
+ </tr>
180
+ <tr>
181
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.1-1B-A400M-Instruct</td>
182
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">46.86</td>
183
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">6.18</td>
184
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">4.08</td>
185
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">0</td>
186
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">0.78</td>
187
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">2.41</td>
188
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">10.05</td>
189
+ </tr>
190
+ </tbody></table>
191
 
192
  **Model Architecture:**
193
  Granite-3.1-8B-Instruct is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings.
194
 
195
+ <table>
196
+ <thead>
197
+ <tr>
198
+ <th style="text-align:left; background-color: #001d6c; color: white;">Model</th>
199
+ <th style="text-align:center; background-color: #001d6c; color: white;">2B Dense</th>
200
+ <th style="text-align:center; background-color: #001d6c; color: white;">8B Dense</th>
201
+ <th style="text-align:center; background-color: #001d6c; color: white;">1B MoE</th>
202
+ <th style="text-align:center; background-color: #001d6c; color: white;">3B MoE</th>
203
+ </tr></thead>
204
+ <tbody>
205
+ <tr>
206
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Embedding size</td>
207
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">2048</td>
208
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">4096</td>
209
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">1024</td>
210
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">1536</td>
211
+ </tr>
212
+ <tr>
213
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Number of layers</td>
214
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">40</td>
215
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">40</td>
216
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">24</td>
217
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">32</td>
218
+ </tr>
219
+ <tr>
220
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Attention head size</td>
221
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">64</td>
222
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">128</td>
223
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">64</td>
224
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">64</td>
225
+ </tr>
226
+ <tr>
227
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Number of attention heads</td>
228
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">32</td>
229
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">32</td>
230
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">16</td>
231
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">24</td>
232
+ </tr>
233
+ <tr>
234
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Number of KV heads</td>
235
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">8</td>
236
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">8</td>
237
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">8</td>
238
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">8</td>
239
+ </tr>
240
+ <tr>
241
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">MLP hidden size</td>
242
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">8192</td>
243
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">12800</td>
244
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">512</td>
245
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">512</td>
246
+ </tr>
247
+ <tr>
248
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">MLP activation</td>
249
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">SwiGLU</td>
250
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">SwiGLU</td>
251
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">SwiGLU</td>
252
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">SwiGLU</td>
253
+ </tr>
254
+ <tr>
255
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Number of experts</td>
256
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">—</td>
257
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">—</td>
258
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">32</td>
259
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">40</td>
260
+ </tr>
261
+ <tr>
262
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">MoE TopK</td>
263
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">—</td>
264
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">—</td>
265
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">8</td>
266
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">8</td>
267
+ </tr>
268
+ <tr>
269
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Initialization std</td>
270
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">0.1</td>
271
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">0.1</td>
272
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">0.1</td>
273
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">0.1</td>
274
+ </tr>
275
+ <tr>
276
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Sequence length</td>
277
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">128K</td>
278
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">128K</td>
279
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">128K</td>
280
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">128K</td>
281
+ </tr>
282
+ <tr>
283
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Position embedding</td>
284
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">RoPE</td>
285
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">RoPE</td>
286
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">RoPE</td>
287
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">RoPE</td>
288
+ </tr>
289
+ <tr>
290
+ <td style="text-align:left; background-color: #FFFFFF; color: black;"># Parameters</td>
291
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">2.5B</td>
292
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">8.1B</td>
293
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">1.3B</td>
294
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">3.3B</td>
295
+ </tr>
296
+ <tr>
297
+ <td style="text-align:left; background-color: #FFFFFF; color: black;"># Active parameters</td>
298
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">2.5B</td>
299
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">8.1B</td>
300
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">400M</td>
301
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">800M</td>
302
+ </tr>
303
+ <tr>
304
+ <td style="text-align:left; background-color: #FFFFFF; color: black;"># Training tokens</td>
305
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">12T</td>
306
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">12T</td>
307
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">10T</td>
308
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">10T</td>
309
+ </tr>
310
+ </tbody></table>
311
 
312
  **Training Data:**
313
  Overall, our SFT data is largely comprised of three key sources: (1) publicly available datasets with permissive license, (2) internal synthetic data targeting specific capabilities including long-context tasks, and (3) very small amounts of human-curated data. A detailed attribution of datasets can be found in the [Granite 3.0 Technical Report](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf), [Granite 3.1 Technical Report (coming soon)](https://huggingface.co/collections/ibm-granite/granite-31-language-models-6751dbbf2f3389bec5c6f02d), and [Accompanying Author List](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/author-ack.pdf).