alexmarques commited on
Commit
295ceb1
·
verified ·
1 Parent(s): 089c84e

Update README.md

Browse files

Updates after model definition and tokenizer changes

Files changed (1) hide show
  1. README.md +22 -22
README.md CHANGED
@@ -22,7 +22,7 @@ license: mit
22
  - **Model Developers:** Neural Magic
23
 
24
  Quantized version of [Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct), a 14 billion-parameter open model trained using the Phi-3 datasets.
25
- It achieves an average score of 73.32 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 73.32.
26
 
27
  ### Model Optimizations
28
 
@@ -188,71 +188,71 @@ lm_eval \
188
  <tr>
189
  <td>MMLU (5-shot)
190
  </td>
191
- <td>75.64
192
  </td>
193
- <td>76.04
194
  </td>
195
- <td>100.0%
196
  </td>
197
  </tr>
198
  <tr>
199
  <td>ARC Challenge (25-shot)
200
  </td>
201
- <td>67.58
202
  </td>
203
- <td>69.71
204
  </td>
205
- <td>103.2%
206
  </td>
207
  </tr>
208
  <tr>
209
  <td>GSM-8K (5-shot, strict-match)
210
  </td>
211
- <td>83.32
212
  </td>
213
- <td>82.03
214
  </td>
215
- <td>98.5%
216
  </td>
217
  </tr>
218
  <tr>
219
  <td>Hellaswag (10-shot)
220
  </td>
221
- <td>84.37
222
  </td>
223
- <td>84.39
224
  </td>
225
- <td>100.0%
226
  </td>
227
  </tr>
228
  <tr>
229
  <td>Winogrande (5-shot)
230
  </td>
231
- <td>75.45
232
  </td>
233
- <td>73.01
234
  </td>
235
- <td>96.8%
236
  </td>
237
  </tr>
238
  <tr>
239
  <td>TruthfulQA (0-shot)
240
  </td>
241
- <td>53.54
242
  </td>
243
- <td>54.76
244
  </td>
245
- <td>102.3%
246
  </td>
247
  </tr>
248
  <tr>
249
  <td><strong>Average</strong>
250
  </td>
251
- <td><strong>73.32</strong>
252
  </td>
253
- <td><strong>73.32</strong>
254
  </td>
255
- <td><strong>100.0%</strong>
256
  </td>
257
  </tr>
258
  </table>
 
22
  - **Model Developers:** Neural Magic
23
 
24
  Quantized version of [Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct), a 14 billion-parameter open model trained using the Phi-3 datasets.
25
+ It achieves an average score of 73.90 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 74.10.
26
 
27
  ### Model Optimizations
28
 
 
188
  <tr>
189
  <td>MMLU (5-shot)
190
  </td>
191
+ <td>76.69
192
  </td>
193
+ <td>76.74
194
  </td>
195
+ <td>100.1%
196
  </td>
197
  </tr>
198
  <tr>
199
  <td>ARC Challenge (25-shot)
200
  </td>
201
+ <td>69.45
202
  </td>
203
+ <td>69.37
204
  </td>
205
+ <td>99.9%
206
  </td>
207
  </tr>
208
  <tr>
209
  <td>GSM-8K (5-shot, strict-match)
210
  </td>
211
+ <td>85.22
212
  </td>
213
+ <td>84.15
214
  </td>
215
+ <td>98.7%
216
  </td>
217
  </tr>
218
  <tr>
219
  <td>Hellaswag (10-shot)
220
  </td>
221
+ <td>85.10
222
  </td>
223
+ <td>84.76
224
  </td>
225
+ <td>99.6%
226
  </td>
227
  </tr>
228
  <tr>
229
  <td>Winogrande (5-shot)
230
  </td>
231
+ <td>73.56
232
  </td>
233
+ <td>73.80
234
  </td>
235
+ <td>100.3%
236
  </td>
237
  </tr>
238
  <tr>
239
  <td>TruthfulQA (0-shot)
240
  </td>
241
+ <td>54.57
242
  </td>
243
+ <td>54.57
244
  </td>
245
+ <td>100.0%
246
  </td>
247
  </tr>
248
  <tr>
249
  <td><strong>Average</strong>
250
  </td>
251
+ <td><strong>74.10</strong>
252
  </td>
253
+ <td><strong>73.90</strong>
254
  </td>
255
+ <td><strong>99.7%</strong>
256
  </td>
257
  </tr>
258
  </table>