OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
Abstract
Recent advancements in open-source multi-modal large language models (MLLMs) have primarily focused on enhancing foundational capabilities, leaving a significant gap in human preference alignment. This paper introduces OmniAlign-V, a comprehensive dataset of 200K high-quality training samples featuring diverse images, complex questions, and varied response formats to improve MLLMs' alignment with human preferences. We also present MM-AlignBench, a human-annotated benchmark specifically designed to evaluate MLLMs' alignment with human values. Experimental results show that finetuning MLLMs with OmniAlign-V, using Supervised Fine-Tuning (SFT) or Direct Preference Optimization (DPO), significantly enhances human preference alignment while maintaining or enhancing performance on standard VQA benchmarks, preserving their fundamental capabilities. Our datasets, benchmark, code and checkpoints have been released at https://github.com/PhoenixZ810/OmniAlign-V.
Community
Paper: Paper,
Github: Github,
Page: Page,
SFT Dataset: OmniAlign-V,
DPO Dataset: OmniAlign-V-DPO,
In this work, we introduce three key contributions, OmniAlign-V SFT dataset, OmniAlign-V-DPO dataset, and MM-AlignBench:
- OmniAlign-V SFT Dataset: A SFT dataset designed to improve the alignment of Multi-modal Large Language Models (MLLMs) with human preferences. It contains 205k high-quality Image-Question-Answer pairs , featuring open-ended, creative questions and long, knowledge-rich, comprehensive answers.
- OmniAlign-V-DPO Dataset: A specialized dataset for Direct Preference Optimization (DPO). It leverages the answers from the OmniAlign-V SFT dataset as positive samples and generates negative samples using LLaVANext-InternLM-7B with rejection sampling.
- MM-AlignBench: A benchmark for evaluating MLLMs' alignment with human preferences. It includes 252 high-quality, human-annotated samples with diverse image types and open-ended questions. Modeled after Arena-style benchmarks, it uses GPT-4o as the judge model and Claude-Sonnet-3 as the reference model.
๐ฅ Dataset Performance
Our OmniAlign-V SFT dataset not only significantly improves the alignment of MLLMs with human preference, but also boosts the performance of MLLMs on common downstream tasks, particularly on benchmarks like MMVet and MMMU.
By incorporating a DPO stage using our OmniAlign-V-DPO dataset, we achieve even better alignment with human preferences. Notably, our LLaVANext-OA-32B model, built on the Qwen2.5-32B-Instruct foundation, surpasses Qwen2VL-72B on the MM-AlignBench.
๐ MM-AlignBench
MM-AlignBench is now supported in VLMEvalKit, a powerful toolkit for evaluating over 200 MLLMs across various benchmarks. For more details, check out the VLMEvalKit repository .
Models citing this paper 3
Datasets citing this paper 2
Spaces citing this paper 0
No Space linking this paper