Add vision evals
Browse files
README.md
CHANGED
@@ -321,6 +321,26 @@ Non-coding tasks were evaluated with [lm-evaluation-harness](https://github.com/
|
|
321 |
--batch_size auto
|
322 |
```
|
323 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
324 |
**Coding**
|
325 |
|
326 |
The commands below can be used for mbpp by simply replacing the dataset name.
|
@@ -353,7 +373,6 @@ evalplus.evaluate \
|
|
353 |
|
354 |
### Accuracy
|
355 |
|
356 |
-
#### Open LLM Leaderboard evaluation scores
|
357 |
<table>
|
358 |
<tr>
|
359 |
<th>Category
|
@@ -513,5 +532,28 @@ evalplus.evaluate \
|
|
513 |
<td>100.2%
|
514 |
</td>
|
515 |
</tr>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
516 |
</table>
|
517 |
|
|
|
321 |
--batch_size auto
|
322 |
```
|
323 |
|
324 |
+
**MMMU**
|
325 |
+
```
|
326 |
+
lm_eval \
|
327 |
+
--model vllm \
|
328 |
+
--model_args pretrained="RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-quantized.w4a16",dtype=auto,gpu_memory_utilization=0.9,max_images=8,enable_chunk_prefill=True,tensor_parallel_size=2 \
|
329 |
+
--tasks mmmu_val \
|
330 |
+
--apply_chat_template\
|
331 |
+
--batch_size auto
|
332 |
+
```
|
333 |
+
|
334 |
+
**ChartQA**
|
335 |
+
```
|
336 |
+
lm_eval \
|
337 |
+
--model vllm \
|
338 |
+
--model_args pretrained="RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-quantized.w4a16",dtype=auto,gpu_memory_utilization=0.9,max_images=8,enable_chunk_prefill=True,tensor_parallel_size=2 \
|
339 |
+
--tasks chartqa \
|
340 |
+
--apply_chat_template\
|
341 |
+
--batch_size auto
|
342 |
+
```
|
343 |
+
|
344 |
**Coding**
|
345 |
|
346 |
The commands below can be used for mbpp by simply replacing the dataset name.
|
|
|
373 |
|
374 |
### Accuracy
|
375 |
|
|
|
376 |
<table>
|
377 |
<tr>
|
378 |
<th>Category
|
|
|
532 |
<td>100.2%
|
533 |
</td>
|
534 |
</tr>
|
535 |
+
<tr>
|
536 |
+
<td rowspan="2" ><strong>Vision</strong>
|
537 |
+
</td>
|
538 |
+
<td>MMMU (0-shot)
|
539 |
+
</td>
|
540 |
+
<td>52.11
|
541 |
+
</td>
|
542 |
+
<td>50.11
|
543 |
+
</td>
|
544 |
+
<td>96.2%
|
545 |
+
</td>
|
546 |
+
</tr>
|
547 |
+
<tr>
|
548 |
+
<td>ChartQA (0-shot)
|
549 |
+
</td>
|
550 |
+
<td>81.36
|
551 |
+
</td>
|
552 |
+
<td>80.92
|
553 |
+
</td>
|
554 |
+
<td>99.5%
|
555 |
+
</td>
|
556 |
+
</tr>
|
557 |
+
<tr>
|
558 |
</table>
|
559 |
|