PhoenixZ's picture
Upload README.md with huggingface_hub
85dfb96 verified

Introduction

Paper: Paper,

Github: Github,

Page: Page,

SFT Dataset: OmniAlign-V,

DPO Dataset: OmniAlign-V-DPO,

MM-AlignBench: MM-AlignBench

Checkpoints: LLaVANext-OA-7B, LLaVANext-OA-32B, LLaVANext-OA-32B-DPO

This is the official repo of LLaVANext-OmniAlign(OA)-32B-DPO in OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference.

LLaVANext-OmniAlign-32B-DPO is based on LLaVA-Next structure with Qwen2.5-32B-Instruct.

By applying DPO stage using OmniAlign-V-DPO datasets, we can further improve the alignment of MLLMs with human preference.

Performance

By integrating OmniAlign-V-DPO datasets in DPO stage, we can further improve the alignment of MLLMs with human preference. Our LLaVANext-OA-32B-DPO even surpasses Qwen2VL-72B on MM-AlignBench.

Model Win Rate Reward Better+ Better Tie Worse Worse+
Claude3.5V-Sonnet 84.9 +51.4 70 144 12 31 4
GPT-4o 81.3 +49.0 81 124 12 31 4
GPT-4V 82.5 +46.0 57 157 12 31 1
GeminiFlash1.5-002 77.0 +39.1 56 138 14 35 9
LLaVANext-OA-32B-DPO 74.2 +36.9 49 138 20 40 5
Qwen2VL-72B 61.5 +21.6 43 112 15 75 7
LLaVANext-OA-32B 62.3 +19.4 31 126 19 62 14
Claude-3V-Sonnet 50 0 - - - - -
Qwen2VL-7B 44.4 -5.8 28 84 5 101 34
InternVL2-72B 44.4 -6.9 19 93 8 98 34
InternVL2-8B-MPO 40.1 -10.9 26 75 10 100 41
InternVL2-8B 31.3 -21.8 18 61 15 109 49
LLaMA3.2-Vision-11B 27.8 -33.7 18 52 4 98 80
LLaVANext-Qwen32B 26.6 -29.0 16 51 10 121 54
LLaVA-OneVision-7B 23.8 -46.2 14 46 1 75 116
MiniCPM-V-2.5 12.7 -53.0 9 23 8 116 96
Xcomposer2.5-7B 7.5 -74.0 5 14 3 63 167
Idefics3-8B 2.7 -92.3 3 4 0 15 230

How to use

Please refer to our Github for more details about training and evaluation.