alexmarques commited on
Commit
1d08bf0
·
verified ·
1 Parent(s): c46a231

Add vision evals

Browse files
Files changed (1) hide show
  1. README.md +43 -1
README.md CHANGED
@@ -321,6 +321,26 @@ Non-coding tasks were evaluated with [lm-evaluation-harness](https://github.com/
321
  --batch_size auto
322
  ```
323
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
324
  **Coding**
325
 
326
  The commands below can be used for mbpp by simply replacing the dataset name.
@@ -353,7 +373,6 @@ evalplus.evaluate \
353
 
354
  ### Accuracy
355
 
356
- #### Open LLM Leaderboard evaluation scores
357
  <table>
358
  <tr>
359
  <th>Category
@@ -513,5 +532,28 @@ evalplus.evaluate \
513
  <td>100.2%
514
  </td>
515
  </tr>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
516
  </table>
517
 
 
321
  --batch_size auto
322
  ```
323
 
324
+ **MMMU**
325
+ ```
326
+ lm_eval \
327
+ --model vllm \
328
+ --model_args pretrained="RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-quantized.w4a16",dtype=auto,gpu_memory_utilization=0.9,max_images=8,enable_chunk_prefill=True,tensor_parallel_size=2 \
329
+ --tasks mmmu_val \
330
+ --apply_chat_template\
331
+ --batch_size auto
332
+ ```
333
+
334
+ **ChartQA**
335
+ ```
336
+ lm_eval \
337
+ --model vllm \
338
+ --model_args pretrained="RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-quantized.w4a16",dtype=auto,gpu_memory_utilization=0.9,max_images=8,enable_chunk_prefill=True,tensor_parallel_size=2 \
339
+ --tasks chartqa \
340
+ --apply_chat_template\
341
+ --batch_size auto
342
+ ```
343
+
344
  **Coding**
345
 
346
  The commands below can be used for mbpp by simply replacing the dataset name.
 
373
 
374
  ### Accuracy
375
 
 
376
  <table>
377
  <tr>
378
  <th>Category
 
532
  <td>100.2%
533
  </td>
534
  </tr>
535
+ <tr>
536
+ <td rowspan="2" ><strong>Vision</strong>
537
+ </td>
538
+ <td>MMMU (0-shot)
539
+ </td>
540
+ <td>52.11
541
+ </td>
542
+ <td>50.11
543
+ </td>
544
+ <td>96.2%
545
+ </td>
546
+ </tr>
547
+ <tr>
548
+ <td>ChartQA (0-shot)
549
+ </td>
550
+ <td>81.36
551
+ </td>
552
+ <td>80.92
553
+ </td>
554
+ <td>99.5%
555
+ </td>
556
+ </tr>
557
+ <tr>
558
  </table>
559