Introduction
Paper: Paper,
Github: Github,
Page: Page,
SFT Dataset: OmniAlign-V,
DPO Dataset: OmniAlign-V-DPO,
MM-AlignBench: MM-AlignBench
Checkpoints: LLaVANext-OA-7B, LLaVANext-OA-32B, LLaVANext-OA-32B-DPO
This is the official repo of LLaVANext-OmniAlign(OA)-7B in OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference.
LLaVANext-OmniAlign-7B is based on LLaVA-Next structure with InternLM2.5-7B-chat.
By combining LLaVA-Next-SFT-738k-multimodal and OmniAlign-V datasets, we can significantly improve the alignment of MLLMs with human preference and enhance the performance of MLLMs on common downstream tasks, especially on MMVet and MMMU.
Performance
By integrating OmniAlign-V datasets in Supervised Fine-tuning(SFT) stage, we can not only significantly improve the alignment of MLLMs with human preference, but also enhance the performance of MLLMs on common downstream tasks, especially on MMVet and MMMU.
Model | Data | LLM | MM-AlignBench | WildVision | MIA-Bench | MMVet | MMMU | MMBenchV1.1 | AI2D | OCRBench |
---|---|---|---|---|---|---|---|---|---|---|
LLaVA | LLaVANext-778k | InternLM2.5-7B | 3.6 / -82.1 | 18.4 / -55.1 | 75.4 | 41.2 | 42.6 | 73.6 | 74.1 | 39.7 |
LLaVA | OmniAlign-V_mix | InternLM2.5-7B | 50.0 / +3.8 | 28.2 / -34.6 | 85.4 | 43.5 | 43.3 | 73.7 | 74.7 | 41.3 |
+ 46.4 / 85.9 | + 9.8 / 20.5 | + 10.0 | + 2.3 | + 0.7 | + 0.1 | + 0.6 | + 1.6 | |||
LLaVANext | LLaVANext-778k | InternLM2.5-7B | 20.6 / -42.7 | 23.4 / -45.0 | 76.9 | 41.8 | 44.1 | 75.1 | 74.7 | 56.2 |
LLaVANext | OmniAlign-V_mix | InternLM2.5-7B | 57.1 / +11.1 | 29.6 / -31.3 | 86.7 | 47.7 | 46.8 | 74.9 | 77.5 | 58.9 |
+ 36.5 / 53.8 | + 6.2 / 13.7 | + 9.8 | + 5.9 | + 2.7 | - 0.2 | + 2.8 | + 2.7 | |||
LLaVANext | LLaVANext-778k | Qwen2.5-32B | 26.6 / -29.0 | 25.2 / -41.3 | 86.0 | 47.7 | 55.2 | 79.3 | 79.6 | 55.9 |
LLaVANext | OmniAlign-V_mix | Qwen2.5-32B | 62.3 / +19.4 | 40.2 / -14.9 | 89.6 | 56.9 | 60.7 | 80.6 | 81.7 | 55.9 |
+ 35.7 / 48.4 | + 15.0/26.4 | + 3.6 | + 9.2 | + 5.5 | + 1.3 | + 2.1 | + 0.0 |
For MM-AlignBench and WildVision, A/B denotes Winning Rate/Reward.
How to use
Please refer to our Github for more details about training and evaluation.
- Downloads last month
- 2