PhoenixZ/LLaVANext-OmniAlign-32B-DPO

Introduction

Paper: Paper,

Github: Github,

Page: Page,

Checkpoints: LLaVANext-OA-7B, LLaVANext-OA-32B, LLaVANext-OA-32B-DPO

This is the official repo of LLaVANext-OmniAlign(OA)-32B-DPO in OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference.

LLaVANext-OmniAlign-32B-DPO is based on LLaVA-Next structure with Qwen2.5-32B-Instruct.

By applying DPO stage using OmniAlign-V-DPO datasets, we can further improve the alignment of MLLMs with human preference.

Performance

By integrating OmniAlign-V-DPO datasets in DPO stage, we can further improve the alignment of MLLMs with human preference. Our LLaVANext-OA-32B-DPO even surpasses Qwen2VL-72B on MM-AlignBench.

Model	Win Rate	Reward	Better+	Better	Tie	Worse	Worse+
Claude3.5V-Sonnet	84.9	+51.4	70	144	13	25	0
GPT-4o	81.3	+49.0	81	124	12	31	4
GPT-4V	82.5	+46.0	57	151	12	31	1
GeminiFlash1.5-002	77.0	+39.1	56	138	14	35	9
LLaVANext-OA-32B-DPO	74.2	+36.9	49	138	20	40	5
Qwen2VL-72B	61.5	+21.6	43	112	15	75	7
LLaVANext-OA-32B	62.3	+19.4	31	126	19	62	14
Claude-3V-Sonnet	50	0	-	-	-	-	-
Qwen2VL-7B	44.4	-5.8	28	84	5	101	34
InternVL2-72B	44.4	-6.9	19	93	8	98	34
InternVL2-8B-MPO	40.1	-10.9	26	75	10	100	41
InternVL2-8B	31.3	-21.8	18	61	15	109	49
LLaMA3.2-Vision-11B	27.8	-33.7	18	52	4	98	80
LLaVANext-Qwen32B	26.6	-29.0	16	51	10	121	54
LLaVA-OneVision-7B	23.8	-46.2	14	46	1	75	116
MiniCPM-V-2.5	12.7	-53.0	9	23	8	116	96
Xcomposer2.5-7B	7.5	-74.0	5	14	3	63	167
Idefics3-8B	2.7	-92.3	3	4	0	15	230

How to use

Please refer to our Github for more details about training and evaluation.

PhoenixZ
/

LLaVANext-OmniAlign-32B-DPO

Introduction

Performance

How to use

Collection including PhoenixZ/LLaVANext-OmniAlign-32B-DPO

OmniAlign-V