Introduction
Paper: Paper,
Github: Github,
Page: Page,
SFT Dataset: OmniAlign-V,
DPO Dataset: OmniAlign-V-DPO,
MM-AlignBench: MM-AlignBench
Checkpoints: LLaVANext-OA-7B, LLaVANext-OA-32B, LLaVANext-OA-32B-DPO
This is the official repo of LLaVANext-OmniAlign(OA)-32B-DPO in OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference.
LLaVANext-OmniAlign-32B-DPO is based on LLaVA-Next structure with Qwen2.5-32B-Instruct.
By applying DPO stage using OmniAlign-V-DPO datasets, we can further improve the alignment of MLLMs with human preference.
Performance
By integrating OmniAlign-V-DPO datasets in DPO stage, we can further improve the alignment of MLLMs with human preference. Our LLaVANext-OA-32B-DPO even surpasses Qwen2VL-72B on MM-AlignBench.
Model | Win Rate | Reward | Better+ | Better | Tie | Worse | Worse+ |
---|---|---|---|---|---|---|---|
Claude3.5V-Sonnet | 84.9 | +51.4 | 70 | 144 | 12 | 31 | 4 |
GPT-4o | 81.3 | +49.0 | 81 | 124 | 12 | 31 | 4 |
GPT-4V | 82.5 | +46.0 | 57 | 157 | 12 | 31 | 1 |
GeminiFlash1.5-002 | 77.0 | +39.1 | 56 | 138 | 14 | 35 | 9 |
LLaVANext-OA-32B-DPO | 74.2 | +36.9 | 49 | 138 | 20 | 40 | 5 |
Qwen2VL-72B | 61.5 | +21.6 | 43 | 112 | 15 | 75 | 7 |
LLaVANext-OA-32B | 62.3 | +19.4 | 31 | 126 | 19 | 62 | 14 |
Claude-3V-Sonnet | 50 | 0 | - | - | - | - | - |
Qwen2VL-7B | 44.4 | -5.8 | 28 | 84 | 5 | 101 | 34 |
InternVL2-72B | 44.4 | -6.9 | 19 | 93 | 8 | 98 | 34 |
InternVL2-8B-MPO | 40.1 | -10.9 | 26 | 75 | 10 | 100 | 41 |
InternVL2-8B | 31.3 | -21.8 | 18 | 61 | 15 | 109 | 49 |
LLaMA3.2-Vision-11B | 27.8 | -33.7 | 18 | 52 | 4 | 98 | 80 |
LLaVANext-Qwen32B | 26.6 | -29.0 | 16 | 51 | 10 | 121 | 54 |
LLaVA-OneVision-7B | 23.8 | -46.2 | 14 | 46 | 1 | 75 | 116 |
MiniCPM-V-2.5 | 12.7 | -53.0 | 9 | 23 | 8 | 116 | 96 |
Xcomposer2.5-7B | 7.5 | -74.0 | 5 | 14 | 3 | 63 | 167 |
Idefics3-8B | 2.7 | -92.3 | 3 | 4 | 0 | 15 | 230 |
How to use
Please refer to our Github for more details about training and evaluation.
- Downloads last month
- 3