Ovis-U1

paper code demo model

Building on the foundation of the Ovis series, Ovis-U1 is a 3-billion-parameter unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerful framework.


The overall architecture of Ovis-U1 (cf. Fig.2 in our report).

πŸ“¦ Installation

Ovis-U1 has been tested with Python 3.10, Torch 2.4.0, Transformers 4.51.3, and DeepSpeed 0.15.4. For a comprehensive list of package dependencies, please consult the requirements.txt file.

git clone [email protected]:AIDC-AI/Ovis-U1.git
conda create -n ovis-u1 python=3.10 -y
conda activate ovis-u1
cd Ovis-U1
pip install -r requirements.txt
pip install -e .

πŸ› οΈ Inference

For multimodal understanding, please run

python test_img_to_txt.py

For text-to-image, please run

python test_txt_to_img.py \
    --height 1024 \
    --width 1024  \
    --steps 50 \
    --seed 42 \
    --txt_cfg 5  

For image editing, please run

python test_img_edit.py \
    --steps 50 \
    --img_cfg 1.5 \
    --txt_cfg 6  

πŸ“Š Performance

OpenCompass Multi-modal Academic Benchmarks

Model Avg MMB MMS MMMU MathVista Hallusion AI2D OCRBench MMVet
GPT-4o 75.4 86 70.2 72.9 71.6 57 86.3 82.2 76.9
InternVL2.5-2B 59.9 70.9 54.3 43.2 51.1 42.3 74.9 80.2 62.6
SAIL-VL-2B 61 73.7 56.5 44.1 62.8 45.9 77.4 83.1 44.2
InternVL3-2B 61.1 78 61.1 48.7 57.6 41.9 78.6 83.1 67
Qwen2.5-VL-3B 64.5 76.8 56.3 51.2 61.2 46.6 81.4 82.8 60
Ovis2-2B 65.2 76.9 56.7 45.6 64.1 50.2 82.7 87.3 58.3
SAIL-VL-1.5-2B 67 78.5 62.6 46.4 67 50 83.7 89.1 58.8
Ristretto-3B 67.7 80.2 62.8 51.3 67.6 50.2 84.2 84.7 60.7
Ovis-U1 69.6 77.8 61.3 51.1 69.4 56.3 85.6 88.3 66.7

GenEval

Model Overall Single object Two object Counting Colors Position Attribute binding
GPT-4o 0.84 0.99 0.92 0.85 0.92 0.75 0.61
BAGEL 0.82 0.99 0.94 0.81 0.88 0.64 0.63
BAGEL πŸ“ 0.88 0.98 0.95 0.84 0.95 0.78 0.77
UniWorld-V1 0.80 0.99 0.93 0.79 0.89 0.49 0.70
UniWorld-V1 πŸ“ 0.84 0.98 0.93 0.81 0.89 0.74 0.71
OmniGen 0.68 0.98 0.84 0.66 0.74 0.40 0.43
OmniGen2 0.80 1 0.95 0.64 0.88 0.55 0.76
OmniGen2 πŸ“ 0.86 0.99 0.96 0.74 0.98 0.71 0.75
Ovis-U1 0.89 0.98 0.98 0.90 0.92 0.79 0.75

πŸ“ denotes using the rewritten prompts

DPG-Bench

Model Overall Global Entity Attribute Relation Other
BAGEL 85.07 88.94 90.37 91.29 90.82 88.67
UniWorld-V1 81.38 83.64 88.39 88.44 89.27 87.22
OmniGen 81.16 87.90 88.97 88.47 87.95 83.56
OmniGen2 83.57 88.81 88.83 90.18 89.37 90.27
Ovis-U1 83.72 82.37 90.08 88.68 93.35 85.20

ImgEdit-Bench

Model Overall Add Adjust Extract Replace Remove Background Style Hybrid Action
GPT-4o 4.2 4.61 4.33 2.9 4.35 3.66 4.57 4.93 3.96 4.89
MagicBrush 1.90 2.84 1.58 1.51 1.97 1.58 1.75 2.38 1.62 1.22
Instruct-P2P 1.88 2.45 1.83 1.44 2.01 1.50 1.44 3.55 1.2 1.46
AnyEdit 2.45 3.18 2.95 1.88 2.47 2.23 2.24 2.85 1.56 2.65
UltraEdit 2.7 3.44 2.81 2.13 2.96 1.45 2.83 3.76 1.91 2.98
OmniGen 2.96 3.47 3.04 1.71 2.94 2.43 3.21 4.19 2.24 3.38
Step1X-Edit 3.06 3.88 3.14 1.76 3.40 2.41 3.16 4.63 2.64 2.52
ICEdit 3.05 3.58 3.39 1.73 3.15 2.93 3.08 3.84 2.04 3.68
BAGEL 3.2 3.56 3.31 1.7 3.3 2.62 3.24 4.49 2.38 4.17
UniWorld-V1 3.26 3.82 3.64 2.27 3.47 3.24 2.99 4.21 2.96 2.74
OmniGen2 3.44 3.57 3.06 1.77 3.74 3.2 3.57 4.81 2.52 4.68
Ovis-U1 4.00 4.13 3.62 2.98 4.45 4.06 4.22 4.69 3.45 4.61

GEdit-Bench-EN

Model Avg Background Change Color Alteration Material Modification Motion Change Portrait Beautification Style Transfer Subject Addition Subject Removal Subject Replacement Text Modification Tone Transformation
GPT-4o 7.534 7.205 6.491 6.607 8.096 7.768 6.961 7.622 8.331 8.067 7.427 8.301
AnyEdit 3.212 4.663 4.260 2.537 2.024 3.479 2.032 3.995 3.089 3.180 0.922 5.151
Instruct-Pix2Pix 3.684 3.825 5.182 3.688 3.509 4.339 4.560 3.461 2.031 4.237 0.955 4.733
MagicBrush 4.518 5.637 5.136 5.078 4.513 4.487 4.439 5.252 3.704 4.941 1.384 5.130
OmniGen 5.062 5.281 6.003 5.308 2.916 3.087 4.903 6.628 6.352 5.616 4.519 5.064
Gemini 6.315 6.781 6.369 6.040 6.938 5.591 4.676 7.501 6.447 7.003 5.765 6.350
Step1X-Edit 6.701 6.547 6.545 6.204 6.483 6.787 7.221 6.975 6.512 7.068 6.921 6.448
Doubao 6.754 7.430 7.095 6.339 6.973 6.972 6.767 7.674 6.748 7.447 3.471 7.383
BAGEL 6.519 7.324 6.909 6.381 4.753 4.573 6.150 7.896 7.164 7.021 7.320 6.218
Ovis-U1 6.420 7.486 6.879 6.208 4.790 5.981 6.463 7.491 7.254 7.266 4.482 6.314

πŸ“š Citation

If you find Ovis-U1 useful, please cite our paper:

@inproceedings{wang2025ovisu1,
title={Ovis-U1 Technical Report},
author={Ovis Team},
year={2025}
}

πŸ™ Acknowledgments

The code is built upon Ovis and FLUX.

πŸ“„ License

The project is released under Apache License 2.0 (http://www.apache.org/licenses/LICENSE-2.0, SPDX-License-identifier: Apache-2.0).

🚨 Disclaimer

We used compliance checking algorithms during the training process, to ensure the compliance of the trained model to the best of our ability. Due to complex data and the diversity of language model usage scenarios, we cannot guarantee that the model is completely free of copyright issues or improper content. If you believe anything infringes on your rights or generates improper content, please contact us, and we will promptly address the matter.

Downloads last month
0
Safetensors
Model size
3.64B params
Tensor type
BF16
Β·
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for AIDC-AI/Ovis-U1-3B

Base model

AIDC-AI/Ovis2-2B
Finetuned
(1)
this model

Dataset used to train AIDC-AI/Ovis-U1-3B

Space using AIDC-AI/Ovis-U1-3B 1