Update README.md
#1
by
qc808082
- opened
README.md
CHANGED
@@ -16,6 +16,16 @@ pipeline_tag: visual-question-answering
|
|
16 |
- As of March 25th, 2025, **INFRL-Qwen2.5-VL-72B-Preview** is the best-performing open-sourced VL model on various visual reasoning benchmarks ([MathVision](https://mathllm.github.io/mathvision/),[MathVista](https://mathvista.github.io/), [EMMA](https://emma-benchmark.github.io/#leaderboard), [MMMUPro](https://mmmu-benchmark.github.io/), [MathVerse](https://mathverse-cuhk.github.io/)).
|
17 |
|
18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
## Evaluation
|
20 |
|
21 |
We will release a code repository with vLLM support for VLM evaluation.
|
|
|
16 |
- As of March 25th, 2025, **INFRL-Qwen2.5-VL-72B-Preview** is the best-performing open-sourced VL model on various visual reasoning benchmarks ([MathVision](https://mathllm.github.io/mathvision/),[MathVista](https://mathvista.github.io/), [EMMA](https://emma-benchmark.github.io/#leaderboard), [MMMUPro](https://mmmu-benchmark.github.io/), [MathVerse](https://mathverse-cuhk.github.io/)).
|
17 |
|
18 |
|
19 |
+
| Models | MathVision (test) | MathVista (testmini) | MathVerse (testmini) |
|
20 |
+
|-------------------|-------------------|----------------------|----------------------|
|
21 |
+
| GPT4o (R1-1V Rep) | 30.6 | 60 | 41.2 |
|
22 |
+
| Gemini-2.0-Flash | 41.3 | 70.1 | 50.6 |
|
23 |
+
| Claude 3.5 Sonnet | 33.5 | 67.7 | 47.8 |
|
24 |
+
| QvQ-72B | 35.9 | 71.4 | 48.6 |
|
25 |
+
| InternVL2.5-78B | 34.9 | 72.3 | 51.7 |
|
26 |
+
| Qwen-VL-2.5-72B | 38.1 | 74.8 | 57.18 |
|
27 |
+
| INFRL-VL-Preview | 41.9 | 77.8 | 58.84 |
|
28 |
+
|
29 |
## Evaluation
|
30 |
|
31 |
We will release a code repository with vLLM support for VLM evaluation.
|