Files changed (1) hide show
  1. README.md +10 -0
README.md CHANGED
@@ -16,6 +16,16 @@ pipeline_tag: visual-question-answering
16
  - As of March 25th, 2025, **INFRL-Qwen2.5-VL-72B-Preview** is the best-performing open-sourced VL model on various visual reasoning benchmarks ([MathVision](https://mathllm.github.io/mathvision/),[MathVista](https://mathvista.github.io/), [EMMA](https://emma-benchmark.github.io/#leaderboard), [MMMUPro](https://mmmu-benchmark.github.io/), [MathVerse](https://mathverse-cuhk.github.io/)).
17
 
18
 
 
 
 
 
 
 
 
 
 
 
19
  ## Evaluation
20
 
21
  We will release a code repository with vLLM support for VLM evaluation.
 
16
  - As of March 25th, 2025, **INFRL-Qwen2.5-VL-72B-Preview** is the best-performing open-sourced VL model on various visual reasoning benchmarks ([MathVision](https://mathllm.github.io/mathvision/),[MathVista](https://mathvista.github.io/), [EMMA](https://emma-benchmark.github.io/#leaderboard), [MMMUPro](https://mmmu-benchmark.github.io/), [MathVerse](https://mathverse-cuhk.github.io/)).
17
 
18
 
19
+ | Models | MathVision (test) | MathVista (testmini) | MathVerse (testmini) |
20
+ |-------------------|-------------------|----------------------|----------------------|
21
+ | GPT4o (R1-1V Rep) | 30.6 | 60 | 41.2 |
22
+ | Gemini-2.0-Flash | 41.3 | 70.1 | 50.6 |
23
+ | Claude 3.5 Sonnet | 33.5 | 67.7 | 47.8 |
24
+ | QvQ-72B | 35.9 | 71.4 | 48.6 |
25
+ | InternVL2.5-78B | 34.9 | 72.3 | 51.7 |
26
+ | Qwen-VL-2.5-72B | 38.1 | 74.8 | 57.18 |
27
+ | INFRL-VL-Preview | 41.9 | 77.8 | 58.84 |
28
+
29
  ## Evaluation
30
 
31
  We will release a code repository with vLLM support for VLM evaluation.