mgoin commited on
Commit
7725cbc
1 Parent(s): 5ce2782

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -2
README.md CHANGED
@@ -185,5 +185,85 @@ lm_eval \
185
  ### Accuracy
186
 
187
  #### Open LLM Leaderboard evaluation scores
188
-
189
- TBD
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
185
  ### Accuracy
186
 
187
  #### Open LLM Leaderboard evaluation scores
188
+ <table>
189
+ <tr>
190
+ <td><strong>Benchmark</strong>
191
+ </td>
192
+ <td><strong>Phi-3.5-mini-instruct</strong>
193
+ </td>
194
+ <td><strong>Phi-3.5-mini-instruct-FP8-KV(this model)</strong>
195
+ </td>
196
+ <td><strong>Recovery</strong>
197
+ </td>
198
+ </tr>
199
+ <tr>
200
+ <td>MMLU (5-shot)
201
+ </td>
202
+ <td>68.81
203
+ </td>
204
+ <td>68.56
205
+ </td>
206
+ <td>99.64%
207
+ </td>
208
+ </tr>
209
+ <tr>
210
+ <td>ARC Challenge (25-shot, acc_norm)
211
+ </td>
212
+ <td>64.68
213
+ </td>
214
+ <td>64.51
215
+ </td>
216
+ <td>99.74%
217
+ </td>
218
+ </tr>
219
+ <tr>
220
+ <td>GSM-8K (5-shot, strict-match)
221
+ </td>
222
+ <td>78.24
223
+ </td>
224
+ <td>77.26
225
+ </td>
226
+ <td>98.75%
227
+ </td>
228
+ </tr>
229
+ <tr>
230
+ <td>Hellaswag (10-shot, acc_norm)
231
+ </td>
232
+ <td>79.03
233
+ </td>
234
+ <td>78.88
235
+ </td>
236
+ <td>99.81%
237
+ </td>
238
+ </tr>
239
+ <tr>
240
+ <td>Winogrande (5-shot, acc)
241
+ </td>
242
+ <td>73.40
243
+ </td>
244
+ <td>73.80
245
+ </td>
246
+ <td>100.5%
247
+ </td>
248
+ </tr>
249
+ <tr>
250
+ <td>TruthfulQA (0-shot, mc2)
251
+ </td>
252
+ <td>56.39
253
+ </td>
254
+ <td>56.95
255
+ </td>
256
+ <td>100.9%
257
+ </td>
258
+ </tr>
259
+ <tr>
260
+ <td><strong>Average</strong>
261
+ </td>
262
+ <td><strong>70.09</strong>
263
+ </td>
264
+ <td><strong>70.00</strong>
265
+ </td>
266
+ <td><strong>99.89%</strong>
267
+ </td>
268
+ </tr>
269
+ </table>