Locutusque leaderboard-pr-bot commited on
Commit
99bd765
1 Parent(s): 214e48a

Adding Evaluation Results (#3)

Browse files

- Adding Evaluation Results (da25352a1da515de11360c06b10aa5c92f7b8b27)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +106 -0
README.md CHANGED
@@ -122,6 +122,98 @@ model-index:
122
  source:
123
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5
124
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
125
  ---
126
  # TinyMistral-248M-v2.5
127
  This model was created by merging TinyMistral-248M-v1 and v2, then further pretraining on synthetic textbooks. The resulting model's performance is superior to both, after personal evaluation.
@@ -169,3 +261,17 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
169
  |Winogrande (5-shot) |47.83|
170
  |GSM8k (5-shot) | 0.00|
171
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
122
  source:
123
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5
124
  name: Open LLM Leaderboard
125
+ - task:
126
+ type: text-generation
127
+ name: Text Generation
128
+ dataset:
129
+ name: IFEval (0-Shot)
130
+ type: HuggingFaceH4/ifeval
131
+ args:
132
+ num_few_shot: 0
133
+ metrics:
134
+ - type: inst_level_strict_acc and prompt_level_strict_acc
135
+ value: 13.36
136
+ name: strict accuracy
137
+ source:
138
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5
139
+ name: Open LLM Leaderboard
140
+ - task:
141
+ type: text-generation
142
+ name: Text Generation
143
+ dataset:
144
+ name: BBH (3-Shot)
145
+ type: BBH
146
+ args:
147
+ num_few_shot: 3
148
+ metrics:
149
+ - type: acc_norm
150
+ value: 3.18
151
+ name: normalized accuracy
152
+ source:
153
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5
154
+ name: Open LLM Leaderboard
155
+ - task:
156
+ type: text-generation
157
+ name: Text Generation
158
+ dataset:
159
+ name: MATH Lvl 5 (4-Shot)
160
+ type: hendrycks/competition_math
161
+ args:
162
+ num_few_shot: 4
163
+ metrics:
164
+ - type: exact_match
165
+ value: 0.0
166
+ name: exact match
167
+ source:
168
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5
169
+ name: Open LLM Leaderboard
170
+ - task:
171
+ type: text-generation
172
+ name: Text Generation
173
+ dataset:
174
+ name: GPQA (0-shot)
175
+ type: Idavidrein/gpqa
176
+ args:
177
+ num_few_shot: 0
178
+ metrics:
179
+ - type: acc_norm
180
+ value: 0.11
181
+ name: acc_norm
182
+ source:
183
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5
184
+ name: Open LLM Leaderboard
185
+ - task:
186
+ type: text-generation
187
+ name: Text Generation
188
+ dataset:
189
+ name: MuSR (0-shot)
190
+ type: TAUR-Lab/MuSR
191
+ args:
192
+ num_few_shot: 0
193
+ metrics:
194
+ - type: acc_norm
195
+ value: 5.07
196
+ name: acc_norm
197
+ source:
198
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5
199
+ name: Open LLM Leaderboard
200
+ - task:
201
+ type: text-generation
202
+ name: Text Generation
203
+ dataset:
204
+ name: MMLU-PRO (5-shot)
205
+ type: TIGER-Lab/MMLU-Pro
206
+ config: main
207
+ split: test
208
+ args:
209
+ num_few_shot: 5
210
+ metrics:
211
+ - type: acc
212
+ value: 1.5
213
+ name: accuracy
214
+ source:
215
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Locutusque/TinyMistral-248M-v2.5
216
+ name: Open LLM Leaderboard
217
  ---
218
  # TinyMistral-248M-v2.5
219
  This model was created by merging TinyMistral-248M-v1 and v2, then further pretraining on synthetic textbooks. The resulting model's performance is superior to both, after personal evaluation.
 
261
  |Winogrande (5-shot) |47.83|
262
  |GSM8k (5-shot) | 0.00|
263
 
264
+
265
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
266
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Locutusque__TinyMistral-248M-v2.5)
267
+
268
+ | Metric |Value|
269
+ |-------------------|----:|
270
+ |Avg. | 3.87|
271
+ |IFEval (0-Shot) |13.36|
272
+ |BBH (3-Shot) | 3.18|
273
+ |MATH Lvl 5 (4-Shot)| 0.00|
274
+ |GPQA (0-shot) | 0.11|
275
+ |MuSR (0-shot) | 5.07|
276
+ |MMLU-PRO (5-shot) | 1.50|
277
+