RedHatAI
/

Mistral-Small-3.1-24B-Instruct-2503-quantized.w4a16

Image-Text-to-Text

compressed-tensors

Model card Files Files and versions Community

alexmarques commited on 14 days ago

Commit

1d08bf0

·

verified ·

1 Parent(s): c46a231

Add vision evals

Files changed (1) hide show

README.md +43 -1

README.md CHANGED Viewed

@@ -321,6 +321,26 @@ Non-coding tasks were evaluated with [lm-evaluation-harness](https://github.com/
     --batch_size auto
   ```
 **Coding**
 The commands below can be used for mbpp by simply replacing the dataset name.
@@ -353,7 +373,6 @@ evalplus.evaluate \
 ### Accuracy
-#### Open LLM Leaderboard evaluation scores
 <table>
   <tr>
    <th>Category
@@ -513,5 +532,28 @@ evalplus.evaluate \
    <td>100.2%
    </td>
   </tr>
 </table>

     --batch_size auto
   ```
+  **MMMU**
+  ```
+  lm_eval \
+    --model vllm \
+    --model_args pretrained="RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-quantized.w4a16",dtype=auto,gpu_memory_utilization=0.9,max_images=8,enable_chunk_prefill=True,tensor_parallel_size=2 \
+    --tasks mmmu_val \
+    --apply_chat_template\
+    --batch_size auto
+  ```
+  **ChartQA**
+  ```
+  lm_eval \
+    --model vllm \
+    --model_args pretrained="RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-quantized.w4a16",dtype=auto,gpu_memory_utilization=0.9,max_images=8,enable_chunk_prefill=True,tensor_parallel_size=2 \
+    --tasks chartqa \
+    --apply_chat_template\
+    --batch_size auto
+  ```
 **Coding**
 The commands below can be used for mbpp by simply replacing the dataset name.
 ### Accuracy
 <table>
   <tr>
    <th>Category
    <td>100.2%
    </td>
   </tr>
+  <tr>
+   <td rowspan="2" ><strong>Vision</strong>
+   </td>
+   <td>MMMU (0-shot)
+   </td>
+   <td>52.11
+   </td>
+   <td>50.11
+   </td>
+   <td>96.2%
+   </td>
+  </tr>
+  <tr>
+   <td>ChartQA (0-shot)
+   </td>
+   <td>81.36
+   </td>
+   <td>80.92
+   </td>
+   <td>99.5%
+   </td>
+  </tr>
+  <tr>
 </table>