PhoenixZ/LLaVANext-OmniAlign-7B

Introduction

Paper: Paper,

Github: Github,

Page: Page,

Checkpoints: LLaVANext-OA-7B, LLaVANext-OA-32B, LLaVANext-OA-32B-DPO

This is the official repo of LLaVANext-OmniAlign(OA)-7B in OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference.

LLaVANext-OmniAlign-7B is based on LLaVA-Next structure with InternLM2.5-7B-chat.

By combining LLaVA-Next-SFT-738k-multimodal and OmniAlign-V datasets, we can significantly improve the alignment of MLLMs with human preference and enhance the performance of MLLMs on common downstream tasks, especially on MMVet and MMMU.

Performance

By integrating OmniAlign-V datasets in Supervised Fine-tuning(SFT) stage, we can not only significantly improve the alignment of MLLMs with human preference, but also enhance the performance of MLLMs on common downstream tasks, especially on MMVet and MMMU.

Model	Data	LLM	MM-AlignBench	WildVision	MIA-Bench	MMVet	MMMU	MMBenchV1.1	AI2D	OCRBench
LLaVA	LLaVANext-778k	InternLM2.5-7B	3.6 / -82.1	18.4 / -55.1	75.4	41.2	42.6	73.6	74.1	39.7
LLaVA	OmniAlign-V_mix	InternLM2.5-7B	50.0 / +3.8	28.2 / -34.6	85.4	43.5	43.3	73.7	74.7	41.3
			+ 46.4 / 85.9	+ 9.8 / 20.5	+ 10.0	+ 2.3	+ 0.7	+ 0.1	+ 0.6	+ 1.6
LLaVANext	LLaVANext-778k	InternLM2.5-7B	20.6 / -42.7	23.4 / -45.0	76.9	41.8	44.1	75.1	74.7	56.2
LLaVANext	OmniAlign-V_mix	InternLM2.5-7B	57.1 / +11.1	29.6 / -31.3	86.7	47.7	46.8	74.9	77.5	58.9
			+ 36.5 / 53.8	+ 6.2 / 13.7	+ 9.8	+ 5.9	+ 2.7	- 0.2	+ 2.8	+ 2.7
LLaVANext	LLaVANext-778k	Qwen2.5-32B	26.6 / -29.0	25.2 / -41.3	86.0	47.7	55.2	79.3	79.6	55.9
LLaVANext	OmniAlign-V_mix	Qwen2.5-32B	62.3 / +19.4	40.2 / -14.9	89.6	56.9	60.7	80.6	81.7	55.9
			+ 35.7 / 48.4	+ 15.0/26.4	+ 3.6	+ 9.2	+ 5.5	+ 1.3	+ 2.1	+ 0.0

For MM-AlignBench and WildVision, A/B denotes Winning Rate/Reward.

How to use

Please refer to our Github for more details about training and evaluation.

PhoenixZ
/

LLaVANext-OmniAlign-7B

Introduction

Performance

How to use

Collection including PhoenixZ/LLaVANext-OmniAlign-7B

OmniAlign-V