Diffusers
nielsr HF Staff commited on
Commit
0e95431
Β·
verified Β·
1 Parent(s): 83dac1f

Improve model card: Add pipeline tag, library name, and abstract

Browse files

This PR improves the model card by:
- Adding `pipeline_tag: image-to-image`, ensuring the model appears in relevant search filters on the Hugging Face Hub (https://huggingface.co/models?pipeline_tag=image-to-image).
- Specifying `library_name: diffusers`, which is indicated as the primary library used by the project.
- Including the paper abstract for a more comprehensive overview of the model.

Files changed (1) hide show
  1. README.md +62 -57
README.md CHANGED
@@ -1,57 +1,62 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
-
5
- <div align="center">
6
- <h1>X2Edit</h1>
7
- <a href='https://arxiv.org/abs/2508.07607'><img src='https://img.shields.io/badge/arXiv-2508.07607-b31b1b.svg'></a> &nbsp;
8
- <a href='https://huggingface.co/datasets/OPPOer/X2Edit-Dataset'><img src='https://img.shields.io/badge/πŸ€—%20HuggingFace-X2Edit Dataset-ffd21f.svg'></a>
9
- <a href='https://huggingface.co/OPPOer/X2Edit'><img src='https://img.shields.io/badge/πŸ€—%20HuggingFace-X2Edit-ffd21f.svg'></a>
10
- <a href='https://www.modelscope.cn/datasets/AIGCer-OPPO/X2Edit-Dataset'><img src='https://img.shields.io/badge/πŸ€–%20ModelScope-X2Edit Dataset-purple.svg'></a>
11
- </div>
12
-
13
- ## Environment
14
-
15
- Prepare the environment, install the required libraries:
16
-
17
- ```shell
18
- $ git clone https://github.com/OPPO-Mente-Lab/X2Edit.git
19
- $ cd X2Edit
20
- $ conda create --name X2Edit python==3.11
21
- $ conda activate X2Edit
22
- $ pip install -r requirements.txt
23
- ```
24
-
25
- ## Inference
26
- We provides inference scripts for editing images with resolutions of **1024** and **512**. In addition, we can choose the base model of X2Edit, including **[FLUX.1-Krea](https://huggingface.co/black-forest-labs/FLUX.1-Krea-dev)**, **[FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)**, **[FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell)**, **[PixelWave](https://huggingface.co/mikeyandfriends/PixelWave_FLUX.1-dev_03)**, **[shuttle-3-diffusion](https://huggingface.co/shuttleai/shuttle-3-diffusion)**, and choose the LoRA for integration with MoE-LoRA including **[Turbo-Alpha](https://huggingface.co/alimama-creative/FLUX.1-Turbo-Alpha)**, **[AntiBlur](https://huggingface.co/Shakker-Labs/FLUX.1-dev-LoRA-AntiBlur)**, **[Midjourney-Mix2](https://huggingface.co/strangerzonehf/Flux-Midjourney-Mix2-LoRA)**, **[Super-Realism](https://huggingface.co/strangerzonehf/Flux-Super-Realism-LoRA)**, **[Chatgpt-Ghibli](https://huggingface.co/openfree/flux-chatgpt-ghibli-lora)**. Choose the model you like and download it. For the MoE-LoRA, we will open source a unified checkpoint that can be used for both 512 and 1024 resolutions.
27
-
28
- Before executing the script, download **[Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)** to select the task type for the input instruction, base model(**FLUX.1-Krea**, **FLUX.1-dev**, **FLUX.1-schnell**, **shuttle-3-diffusion**), **[MLLM](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)** and **[Alignet](https://huggingface.co/OPPOer/X2I/blob/main/qwen2.5-vl-7b_proj.pt)**. All scripts follow analogous command patterns. Simply replace the script filename while maintaining consistent parameter configurations.
29
-
30
- ```shell
31
- $ python infer.py --device cuda --pixel 1024 --num_experts 12 --base_path BASE_PATH --qwen_path QWEN_PATH --lora_path LORA_PATH --extra_lora_path EXTRA_LORA_PATH
32
- ```
33
-
34
- **device:** The device used for inference. default: `cuda`<br>
35
- **pixel:** The resolution of the input image, , you can choose from **[512, 1024]**. default: `1024`<br>
36
- **num_experts:** The number of expert in MoE. default: `12`<br>
37
- **base_path:** The path of base model.<br>
38
- **qwen_path:** The path of model used to select the task type for the input instruction. We use **Qwen3-8B** here.<br>
39
- **lora_path:** The path of MoE-LoRA in X2Edit.<br>
40
- **extra_lora_path:** The path of extra LoRA for plug-and-play. default: `None`.<br>
41
-
42
- ## Citation
43
-
44
- 🌟 If you find our work helpful, please consider citing our paper and leaving valuable stars
45
-
46
- ```
47
- @misc{ma2025x2editrevisitingarbitraryinstructionimage,
48
- title={X2Edit: Revisiting Arbitrary-Instruction Image Editing through Self-Constructed Data and Task-Aware Representation Learning},
49
- author={Jian Ma and Xujie Zhu and Zihao Pan and Qirong Peng and Xu Guo and Chen Chen and Haonan Lu},
50
- year={2025},
51
- eprint={2508.07607},
52
- archivePrefix={arXiv},
53
- primaryClass={cs.CV},
54
- url={https://arxiv.org/abs/2508.07607},
55
- }
56
- ```
57
-
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: image-to-image
4
+ library_name: diffusers
5
+ ---
6
+
7
+ <div align="center">
8
+ <h1>X2Edit</h1>
9
+ <a href='https://arxiv.org/abs/2508.07607'><img src='https://img.shields.io/badge/arXiv-2508.07607-b31b1b.svg'></a> &nbsp;
10
+ <a href='https://github.com/OPPO-Mente-Lab/X2Edit'><img src='https://img.shields.io/badge/GitHub-Code-blue.svg?logo=github'></a> &nbsp;
11
+ <a href='https://huggingface.co/datasets/OPPOer/X2Edit-Dataset'><img src='https://img.shields.io/badge/πŸ€—%20HuggingFace-X2Edit Dataset-ffd21f.svg'></a> &nbsp;
12
+ <a href='https://huggingface.co/OPPOer/X2Edit'><img src='https://img.shields.io/badge/πŸ€—%20HuggingFace-X2Edit-ffd21f.svg'></a> &nbsp;
13
+ <a href='https://www.modelscope.cn/datasets/AIGCer-OPPO/X2Edit-Dataset'><img src='https://img.shields.io/badge/πŸ€–%20ModelScope-X2Edit Dataset-purple.svg'></a>
14
+ </div>
15
+
16
+ ## Abstract
17
+ Existing open-source datasets for arbitrary-instruction image editing remain suboptimal, while a plug-and-play editing module compatible with community-prevalent generative models is notably absent. In this paper, we first introduce the X2Edit Dataset, a comprehensive dataset covering 14 diverse editing tasks, including subject-driven generation. We utilize the industry-leading unified image generation models and expert models to construct the data. Meanwhile, we design reasonable editing instructions with the VLM and implement various scoring mechanisms to filter the data. As a result, we construct 3.7 million high-quality data with balanced categories. Second, to better integrate seamlessly with community image generation models, we design task-aware MoE-LoRA training based on FLUX.1, with only 8% of the parameters of the full model. To further improve the final performance, we utilize the internal representations of the diffusion model and define positive/negative samples based on image editing types to introduce contrastive learning. Extensive experiments demonstrate that the model's editing performance is competitive among many excellent models. Additionally, the constructed dataset exhibits substantial advantages over existing open-source datasets. The open-source code, checkpoints, and datasets for X2Edit can be found at the following link: this https URL .
18
+
19
+ ## Environment
20
+
21
+ Prepare the environment, install the required libraries:
22
+
23
+ ```shell
24
+ $ git clone https://github.com/OPPO-Mente-Lab/X2Edit.git
25
+ $ cd X2Edit
26
+ $ conda create --name X2Edit python==3.11
27
+ $ conda activate X2Edit
28
+ $ pip install -r requirements.txt
29
+ ```
30
+
31
+ ## Inference
32
+ We provides inference scripts for editing images with resolutions of **1024** and **512**. In addition, we can choose the base model of X2Edit, including **[FLUX.1-Krea](https://huggingface.co/black-forest-labs/FLUX.1-Krea-dev)**, **[FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)**, **[FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell)**, **[PixelWave](https://huggingface.co/mikeyandfriends/PixelWave_FLUX.1-dev_03)**, **[shuttle-3-diffusion](https://huggingface.co/shuttleai/shuttle-3-diffusion)**, and choose the LoRA for integration with MoE-LoRA including **[Turbo-Alpha](https://huggingface.co/alimama-creative/FLUX.1-Turbo-Alpha)**, **[AntiBlur](https://huggingface.co/Shakker-Labs/FLUX.1-dev-LoRA-AntiBlur)**, **[Midjourney-Mix2](https://huggingface.co/strangerzonehf/Flux-Midjourney-Mix2-LoRA)**, **[Super-Realism](https://huggingface.co/strangerzonehf/Flux-Super-Realism-LoRA)**, **[Chatgpt-Ghibli](https://huggingface.co/openfree/flux-chatgpt-ghibli-lora)**. Choose the model you like and download it. For the MoE-LoRA, we will open source a unified checkpoint that can be used for both 512 and 1024 resolutions.
33
+
34
+ Before executing the script, download **[Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)** to select the task type for the input instruction, base model(**FLUX.1-Krea**, **FLUX.1-dev**, **FLUX.1-schnell**, **shuttle-3-diffusion**), **[MLLM](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)** and **[Alignet](https://huggingface.co/OPPOer/X2I/blob/main/qwen2.5-vl-7b_proj.pt)**. All scripts follow analogous command patterns. Simply replace the script filename while maintaining consistent parameter configurations.
35
+
36
+ ```shell
37
+ $ python infer.py --device cuda --pixel 1024 --num_experts 12 --base_path BASE_PATH --qwen_path QWEN_PATH --lora_path LORA_PATH --extra_lora_path EXTRA_LORA_PATH
38
+ ```
39
+
40
+ **device:** The device used for inference. default: `cuda`<br>
41
+ **pixel:** The resolution of the input image, , you can choose from **[512, 1024]**. default: `1024`<br>
42
+ **num_experts:** The number of expert in MoE. default: `12`<br>
43
+ **base_path:** The path of base model.<br>
44
+ **qwen_path:** The path of model used to select the task type for the input instruction. We use **Qwen3-8B** here.<br>
45
+ **lora_path:** The path of MoE-LoRA in X2Edit.<br>
46
+ **extra_lora_path:** The path of extra LoRA for plug-and-play. default: `None`.<br>
47
+
48
+ ## Citation
49
+
50
+ 🌟 If you find our work helpful, please consider citing our paper and leaving valuable stars
51
+
52
+ ```
53
+ @misc{ma2025x2editrevisitingarbitraryinstructionimage,
54
+ title={X2Edit: Revisiting Arbitrary-Instruction Image Editing through Self-Constructed Data and Task-Aware Representation Learning},
55
+ author={Jian Ma and Xujie Zhu and Zihao Pan and Qirong Peng and Xu Guo and Chen Chen and Haonan Lu},
56
+ year={2025},
57
+ eprint={2508.07607},
58
+ archivePrefix={arXiv},
59
+ primaryClass={cs.CV},
60
+ url={https://arxiv.org/abs/2508.07607},
61
+ }
62
+ ```