NexaAI/Qwen2.5-VL-7B-Instruct-4bit-MLX

Quickstart

Run them directly with nexa-sdk installed In nexa-sdk CLI:

NexaAI/Qwen2.5-VL-7B-Instruct-4bit-MLX

Overview

In the past five months since Qwen2-VL’s release, numerous developers have built new models on the Qwen2-VL vision-language models, providing us with valuable feedback. During this period, we focused on building more useful vision-language models. Today, we are excited to introduce the latest addition to the Qwen family: Qwen2.5-VL.

Key Enhancements:

  • Understand things visually: Qwen2.5-VL is not only proficient in recognizing common objects such as flowers, birds, fish, and insects, but it is highly capable of analyzing texts, charts, icons, graphics, and layouts within images.

  • Being agentic: Qwen2.5-VL directly plays as a visual agent that can reason and dynamically direct tools, which is capable of computer use and phone use.

  • Understanding long videos and capturing events: Qwen2.5-VL can comprehend videos of over 1 hour, and this time it has a new ability of cpaturing event by pinpointing the relevant video segments.

  • Capable of visual localization in different formats: Qwen2.5-VL can accurately localize objects in an image by generating bounding boxes or points, and it can provide stable JSON outputs for coordinates and attributes.

  • Generating structured outputs: for data like scans of invoices, forms, tables, etc. Qwen2.5-VL supports structured outputs of their contents, benefiting usages in finance, commerce, etc.

Benchmark Results

Image benchmark

Benchmark InternVL2.5-8B MiniCPM-o 2.6 GPT-4o-mini Qwen2-VL-7B Qwen2.5-VL-7B
MMMUval 56 50.4 60 54.1 58.6
MMMU-Proval 34.3 - 37.6 30.5 41.0
DocVQAtest 93 93 - 94.5 95.7
InfoVQAtest 77.6 - - 76.5 82.6
ChartQAtest 84.8 - - 83.0 87.3
TextVQAval 79.1 80.1 - 84.3 84.9
OCRBench 822 852 785 845 864
CC_OCR 57.7 61.6 77.8
MMStar 62.8 60.7 63.9
MMBench-V1.1-Entest 79.4 78.0 76.0 80.7 82.6
MMT-Benchtest - - - 63.7 63.6
MMStar 61.5 57.5 54.8 60.7 63.9
MMVetGPT-4-Turbo 54.2 60.0 66.9 62.0 67.1
HallBenchavg 45.2 48.1 46.1 50.6 52.9
MathVistatestmini 58.3 60.6 52.4 58.2 68.2
MathVision - - - 16.3 25.07

Video Benchmarks

Benchmark Qwen2-VL-7B Qwen2.5-VL-7B
MVBench 67.0 69.6
PerceptionTesttest 66.9 70.5
Video-MMEwo/w subs 63.3/69.0 65.1/71.6
LVBench 45.3
LongVideoBench 54.7
MMBench-Video 1.44 1.79
TempCompass 71.7
MLVU 70.2
CharadesSTA/mIoU 43.6

Agent benchmark

Benchmarks Qwen2.5-VL-7B
ScreenSpot 84.7
ScreenSpot Pro 29.0
AITZ_EM 81.9
Android Control High_EM 60.1
Android Control Low_EM 93.7
AndroidWorld_SR 25.5
MobileMiniWob++_SR 91.4

Reference

Original model card: Qwen/Qwen2.5-VL-7B-Instruct

Downloads last month
96
Safetensors
Model size
1.87B params
Tensor type
F16
·
U32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NexaAI/Qwen2.5-VL-7B-Instruct-4bit-MLX

Finetuned
(474)
this model

Collection including NexaAI/Qwen2.5-VL-7B-Instruct-4bit-MLX