Improve model card: Add pipeline tag, library name, and abstract
Browse filesThis PR improves the model card by:
- Adding `pipeline_tag: image-to-image`, ensuring the model appears in relevant search filters on the Hugging Face Hub (https://huggingface.co/models?pipeline_tag=image-to-image).
- Specifying `library_name: diffusers`, which is indicated as the primary library used by the project.
- Including the paper abstract for a more comprehensive overview of the model.
README.md
CHANGED
@@ -1,57 +1,62 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
<
|
8 |
-
<
|
9 |
-
<a href='https://
|
10 |
-
<a href='https://
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
```
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
**
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
**
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
}
|
56 |
-
|
57 |
-
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
pipeline_tag: image-to-image
|
4 |
+
library_name: diffusers
|
5 |
+
---
|
6 |
+
|
7 |
+
<div align="center">
|
8 |
+
<h1>X2Edit</h1>
|
9 |
+
<a href='https://arxiv.org/abs/2508.07607'><img src='https://img.shields.io/badge/arXiv-2508.07607-b31b1b.svg'></a>
|
10 |
+
<a href='https://github.com/OPPO-Mente-Lab/X2Edit'><img src='https://img.shields.io/badge/GitHub-Code-blue.svg?logo=github'></a>
|
11 |
+
<a href='https://huggingface.co/datasets/OPPOer/X2Edit-Dataset'><img src='https://img.shields.io/badge/π€%20HuggingFace-X2Edit Dataset-ffd21f.svg'></a>
|
12 |
+
<a href='https://huggingface.co/OPPOer/X2Edit'><img src='https://img.shields.io/badge/π€%20HuggingFace-X2Edit-ffd21f.svg'></a>
|
13 |
+
<a href='https://www.modelscope.cn/datasets/AIGCer-OPPO/X2Edit-Dataset'><img src='https://img.shields.io/badge/π€%20ModelScope-X2Edit Dataset-purple.svg'></a>
|
14 |
+
</div>
|
15 |
+
|
16 |
+
## Abstract
|
17 |
+
Existing open-source datasets for arbitrary-instruction image editing remain suboptimal, while a plug-and-play editing module compatible with community-prevalent generative models is notably absent. In this paper, we first introduce the X2Edit Dataset, a comprehensive dataset covering 14 diverse editing tasks, including subject-driven generation. We utilize the industry-leading unified image generation models and expert models to construct the data. Meanwhile, we design reasonable editing instructions with the VLM and implement various scoring mechanisms to filter the data. As a result, we construct 3.7 million high-quality data with balanced categories. Second, to better integrate seamlessly with community image generation models, we design task-aware MoE-LoRA training based on FLUX.1, with only 8% of the parameters of the full model. To further improve the final performance, we utilize the internal representations of the diffusion model and define positive/negative samples based on image editing types to introduce contrastive learning. Extensive experiments demonstrate that the model's editing performance is competitive among many excellent models. Additionally, the constructed dataset exhibits substantial advantages over existing open-source datasets. The open-source code, checkpoints, and datasets for X2Edit can be found at the following link: this https URL .
|
18 |
+
|
19 |
+
## Environment
|
20 |
+
|
21 |
+
Prepare the environment, install the required libraries:
|
22 |
+
|
23 |
+
```shell
|
24 |
+
$ git clone https://github.com/OPPO-Mente-Lab/X2Edit.git
|
25 |
+
$ cd X2Edit
|
26 |
+
$ conda create --name X2Edit python==3.11
|
27 |
+
$ conda activate X2Edit
|
28 |
+
$ pip install -r requirements.txt
|
29 |
+
```
|
30 |
+
|
31 |
+
## Inference
|
32 |
+
We provides inference scripts for editing images with resolutions of **1024** and **512**. In addition, we can choose the base model of X2Edit, including **[FLUX.1-Krea](https://huggingface.co/black-forest-labs/FLUX.1-Krea-dev)**, **[FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)**, **[FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell)**, **[PixelWave](https://huggingface.co/mikeyandfriends/PixelWave_FLUX.1-dev_03)**, **[shuttle-3-diffusion](https://huggingface.co/shuttleai/shuttle-3-diffusion)**, and choose the LoRA for integration with MoE-LoRA including **[Turbo-Alpha](https://huggingface.co/alimama-creative/FLUX.1-Turbo-Alpha)**, **[AntiBlur](https://huggingface.co/Shakker-Labs/FLUX.1-dev-LoRA-AntiBlur)**, **[Midjourney-Mix2](https://huggingface.co/strangerzonehf/Flux-Midjourney-Mix2-LoRA)**, **[Super-Realism](https://huggingface.co/strangerzonehf/Flux-Super-Realism-LoRA)**, **[Chatgpt-Ghibli](https://huggingface.co/openfree/flux-chatgpt-ghibli-lora)**. Choose the model you like and download it. For the MoE-LoRA, we will open source a unified checkpoint that can be used for both 512 and 1024 resolutions.
|
33 |
+
|
34 |
+
Before executing the script, download **[Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)** to select the task type for the input instruction, base model(**FLUX.1-Krea**, **FLUX.1-dev**, **FLUX.1-schnell**, **shuttle-3-diffusion**), **[MLLM](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)** and **[Alignet](https://huggingface.co/OPPOer/X2I/blob/main/qwen2.5-vl-7b_proj.pt)**. All scripts follow analogous command patterns. Simply replace the script filename while maintaining consistent parameter configurations.
|
35 |
+
|
36 |
+
```shell
|
37 |
+
$ python infer.py --device cuda --pixel 1024 --num_experts 12 --base_path BASE_PATH --qwen_path QWEN_PATH --lora_path LORA_PATH --extra_lora_path EXTRA_LORA_PATH
|
38 |
+
```
|
39 |
+
|
40 |
+
**device:** The device used for inference. default: `cuda`<br>
|
41 |
+
**pixel:** The resolution of the input image, , you can choose from **[512, 1024]**. default: `1024`<br>
|
42 |
+
**num_experts:** The number of expert in MoE. default: `12`<br>
|
43 |
+
**base_path:** The path of base model.<br>
|
44 |
+
**qwen_path:** The path of model used to select the task type for the input instruction. We use **Qwen3-8B** here.<br>
|
45 |
+
**lora_path:** The path of MoE-LoRA in X2Edit.<br>
|
46 |
+
**extra_lora_path:** The path of extra LoRA for plug-and-play. default: `None`.<br>
|
47 |
+
|
48 |
+
## Citation
|
49 |
+
|
50 |
+
π If you find our work helpful, please consider citing our paper and leaving valuable stars
|
51 |
+
|
52 |
+
```
|
53 |
+
@misc{ma2025x2editrevisitingarbitraryinstructionimage,
|
54 |
+
title={X2Edit: Revisiting Arbitrary-Instruction Image Editing through Self-Constructed Data and Task-Aware Representation Learning},
|
55 |
+
author={Jian Ma and Xujie Zhu and Zihao Pan and Qirong Peng and Xu Guo and Chen Chen and Haonan Lu},
|
56 |
+
year={2025},
|
57 |
+
eprint={2508.07607},
|
58 |
+
archivePrefix={arXiv},
|
59 |
+
primaryClass={cs.CV},
|
60 |
+
url={https://arxiv.org/abs/2508.07607},
|
61 |
+
}
|
62 |
+
```
|