Update README.md
Browse files
README.md
CHANGED
@@ -169,21 +169,23 @@ Diverse thematic data were included to enhance the model's capabilities in subta
|
|
169 |
|
170 |
As there is a lack of multimodal multilingual evaluation data, we haven't performed a thorough multilingual evaluation yet (coming soon). The English evaluations are shown in the table below:
|
171 |
|
172 |
-
| Task
|
173 |
-
|
174 |
-
| ai2d
|
175 |
-
| mme
|
176 |
-
|
|
177 |
-
| mmmu_val
|
178 |
-
| mmstar
|
179 |
-
|
|
180 |
-
|
|
181 |
-
|
|
182 |
-
|
|
183 |
-
|
|
184 |
-
|
|
185 |
-
| realworldqa
|
186 |
-
|mmbench_en_dev| | exact_match | 0.7113 |
|
|
|
|
|
187 |
|
188 |
---
|
189 |
|
|
|
169 |
|
170 |
As there is a lack of multimodal multilingual evaluation data, we haven't performed a thorough multilingual evaluation yet (coming soon). The English evaluations are shown in the table below:
|
171 |
|
172 |
+
| Task | Subtask | Metric | Value |
|
173 |
+
|----------------|-------------------------|-------------------------|-----------|
|
174 |
+
| ai2d | | exact_match | 0.7451 |
|
175 |
+
| mme | cognition_score | mme_cognition_score | 246.4286 |
|
176 |
+
| | perception_score | mme_perception_score | 1371.8164 |
|
177 |
+
| mmmu_val | | accuracy | 0.3689 |
|
178 |
+
| mmstar | average | accuracy | 0.4865 |
|
179 |
+
| | coarse perception | accuracy | 0.7127 |
|
180 |
+
| | fine-grained perception | accuracy | 0.3799 |
|
181 |
+
| | instance reasoning | accuracy | 0.5674 |
|
182 |
+
| | logical reasoning | accuracy | 0.4478 |
|
183 |
+
| | math | accuracy | 0.4279 |
|
184 |
+
| | science & technology | accuracy | 0.3832 |
|
185 |
+
| realworldqa | | exact_match | 0.5699 |
|
186 |
+
| mmbench_en_dev | | exact_match | 0.7113 |
|
187 |
+
| docvqa_val | | anls | 0.6805 |
|
188 |
+
| infovqa_val | | anls | 0.4859 |
|
189 |
|
190 |
---
|
191 |
|