Add pipeline tag and library name + include Github README (#1)
Browse files- Add pipeline tag and library name + include Github README (682127e2505360f3922203c3635092905d97a569)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
@@ -1,3 +1,152 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
pipeline_tag: image-to-image
|
4 |
+
library_name: diffusers
|
5 |
+
---
|
6 |
+
|
7 |
+
# In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer
|
8 |
+
|
9 |
+
<div>
|
10 |
+
<a href="https://river-zhang.github.io/zechuanzhang//" target="_blank">Zechuan Zhang</a> 
|
11 |
+
<a href="https://horizonwind2004.github.io/" target="_blank">Ji Xie</a> 
|
12 |
+
<a href="https://yulu.net.cn/" target="_blank">Yu Lu</a> 
|
13 |
+
<a href="https://z-x-yang.github.io/" target="_blank">Zongxin Yang</a> 
|
14 |
+
<a href="https://scholar.google.com/citations?user=RMSuNFwAAAAJ&hl=zh-CN&oi=ao" target="_blank">Yi Yang✉</a> 
|
15 |
+
</div>
|
16 |
+
<div>
|
17 |
+
ReLER, CCAI, Zhejiang University; Harvard University
|
18 |
+
</div>
|
19 |
+
<div>
|
20 |
+
<sup>✉</sup>Corresponding Author
|
21 |
+
</div>
|
22 |
+
<div>
|
23 |
+
<a href="https://arxiv.org/abs/2504.20690" target="_blank">Arxiv</a> 
|
24 |
+
<a href="https://river-zhang.github.io/ICEdit-gh-pages/" target="_blank">Project Page</a>
|
25 |
+
</div>
|
26 |
+
|
27 |
+
|
28 |
+
<div style="width: 80%; margin:auto;">
|
29 |
+
<img style="width:100%; display: block; margin: auto;" src="docs/images/teaser.png">
|
30 |
+
<p style="text-align: left;">We present In-Context Edit, a novel approach that achieves state-of-the-art instruction-based editing <b>using just 0.5% of the training data and 1% of the parameters required by prior SOTA methods</b>. The first row illustrates a series of multi-turn edits, executed with high precision, while the second and third rows highlight diverse, visually impressive single-turn editing results from our method.</p>
|
31 |
+
</div>
|
32 |
+
|
33 |
+
:open_book: For more visual results, go checkout our <a href="https://river-zhang.github.io/ICEdit-gh-pages/" target="_blank">project page</a>
|
34 |
+
|
35 |
+
This repository will contain the official implementation of _ICEdit_.
|
36 |
+
|
37 |
+
|
38 |
+
<div align="left">
|
39 |
+
|
40 |
+
# To Do List
|
41 |
+
|
42 |
+
- [x] Inference Code
|
43 |
+
- [ ] Inference-time Scaling with VLM
|
44 |
+
- [x] Pretrained Weights
|
45 |
+
- [ ] More Inference Demos
|
46 |
+
- [x] Gradio demo
|
47 |
+
- [ ] Comfy UI demo
|
48 |
+
- [ ] Training Code
|
49 |
+
|
50 |
+
# News
|
51 |
+
- **[2025/4/30]** 🔥 We release the inference code and [pretrained weights](https://huggingface.co/sanaka87/ICEdit-MoE-LoRA/tree/main) on Huggingface 🤗!
|
52 |
+
- **[2025/4/30]** 🔥 We release the [paper](https://arxiv.org/abs/2504.20690) on arXiv!
|
53 |
+
- **[2025/4/29]** We release the [project page](https://river-zhang.github.io/ICEdit-gh-pages/) and demo video! Codes will be made available in next week~ Happy Labor Day!
|
54 |
+
|
55 |
+
|
56 |
+
# Installation
|
57 |
+
|
58 |
+
## Conda environment setup
|
59 |
+
|
60 |
+
```bash
|
61 |
+
conda create -n icedit python=3.10
|
62 |
+
conda activate icedit
|
63 |
+
pip install -r requirements.txt
|
64 |
+
```
|
65 |
+
|
66 |
+
## Download pretrained weights
|
67 |
+
|
68 |
+
If you can connect to Huggingface, you don't need to download the weights. Otherwise, you need to download the weights to local.
|
69 |
+
|
70 |
+
- [Flux.1-fill-dev](https://huggingface.co/black-forest-labs/flux.1-fill-dev).
|
71 |
+
- [ICEdit-MoE-LoRA](https://huggingface.co/sanaka87/ICEdit-MoE-LoRA).
|
72 |
+
|
73 |
+
## Inference in bash (w/o VLM Inference-time Scaling)
|
74 |
+
|
75 |
+
Now you can have a try!
|
76 |
+
|
77 |
+
> Our model can **only edit images with a width of 512 pixels** (there is no restriction on the height). If you pass in an image with a width other than 512 pixels, the model will automatically resize it to 512 pixels.
|
78 |
+
|
79 |
+
> If you found the model failed to generate the expected results, please try to change the `--seed` parameter. Inference-time Scaling with VLM can help much to improve the results.
|
80 |
+
|
81 |
+
```bash
|
82 |
+
python scripts/inference.py --image assets/girl.png \
|
83 |
+
--instruction "Make her hair dark green and her clothes checked." \
|
84 |
+
--seed 42 \
|
85 |
+
```
|
86 |
+
|
87 |
+
Editing a 512×768 image requires 35 GB of GPU memory. If you need to run on a system with 24 GB of GPU memory (for example, an NVIDIA RTX3090), you can add the `--enable-model-cpu-offload` parameter.
|
88 |
+
|
89 |
+
```bash
|
90 |
+
python scripts/inference.py --image assets/girl.png \
|
91 |
+
--instruction "Make her hair dark green and her clothes checked." \
|
92 |
+
--enable-model-cpu-offload
|
93 |
+
```
|
94 |
+
|
95 |
+
If you have downloaded the pretrained weights locally, please pass the parameters during inference, as in:
|
96 |
+
|
97 |
+
```bash
|
98 |
+
python scripts/inference.py --image assets/girl.png \
|
99 |
+
--instruction "Make her hair dark green and her clothes checked." \
|
100 |
+
--flux-path /path/to/flux.1-fill-dev \
|
101 |
+
--lora-path /path/to/ICEdit-MoE-LoRA
|
102 |
+
```
|
103 |
+
|
104 |
+
## Inference in Gradio Demo
|
105 |
+
|
106 |
+
We provide a gradio demo for you to edit images in a more user-friendly way. You can run the following command to start the demo.
|
107 |
+
|
108 |
+
```bash
|
109 |
+
python scripts/gradio_demo.py --port 7860
|
110 |
+
```
|
111 |
+
|
112 |
+
Like the inference script, if you want to run the demo on a system with 24 GB of GPU memory, you can add the `--enable-model-cpu-offload` parameter. And if you have downloaded the pretrained weights locally, please pass the parameters during inference, as in:
|
113 |
+
|
114 |
+
```bash
|
115 |
+
python scripts/gradio_demo.py --port 7860 \
|
116 |
+
--flux-path /path/to/flux.1-fill-dev (optional) \
|
117 |
+
--lora-path /path/to/ICEdit-MoE-LoRA (optional) \
|
118 |
+
--enable-model-cpu-offload (optional) \
|
119 |
+
```
|
120 |
+
|
121 |
+
Then you can open the link in your browser to edit images.
|
122 |
+
|
123 |
+
### 🎨 Enjoy your editing!
|
124 |
+
|
125 |
+
|
126 |
+
|
127 |
+
# Comparison with Commercial Models
|
128 |
+
|
129 |
+
<div align="center">
|
130 |
+
<div style="width: 80%; text-align: left; margin:auto;">
|
131 |
+
<img style="width:100%" src="docs/images/gpt4o_comparison.png">
|
132 |
+
<p style="text-align: left;">Compared with commercial models such as Gemini and GPT-4o, our methods are comparable to and even superior to these commercial models in terms of character ID preservation and instruction following. <b>We are more open-source than them, with lower costs, faster speed (it takes about 9 seconds to process one image), and powerful performance</b>.</p>
|
133 |
+
</div>
|
134 |
+
|
135 |
+
|
136 |
+
<div align="left">
|
137 |
+
|
138 |
+
|
139 |
+
# Bibtex
|
140 |
+
If this work is helpful for your research, please consider citing the following BibTeX entry.
|
141 |
+
|
142 |
+
```
|
143 |
+
@misc{zhang2025ICEdit,
|
144 |
+
title={In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer},
|
145 |
+
author={Zechuan Zhang and Ji Xie and Yu Lu and Zongxin Yang and Yi Yang},
|
146 |
+
year={2025},
|
147 |
+
eprint={2504.20690},
|
148 |
+
archivePrefix={arXiv},
|
149 |
+
primaryClass={cs.CV},
|
150 |
+
url={https://arxiv.org/abs/2504.20690},
|
151 |
+
}
|
152 |
+
```
|