Image-to-Image
Diffusers
English
art
sanaka87 nielsr HF Staff commited on
Commit
60dc201
·
verified ·
1 Parent(s): 274ac12

Add pipeline tag and library name + include Github README (#1)

Browse files

- Add pipeline tag and library name + include Github README (682127e2505360f3922203c3635092905d97a569)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +152 -3
README.md CHANGED
@@ -1,3 +1,152 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: image-to-image
4
+ library_name: diffusers
5
+ ---
6
+
7
+ # In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer
8
+
9
+ <div>
10
+ <a href="https://river-zhang.github.io/zechuanzhang//" target="_blank">Zechuan Zhang</a>&emsp;
11
+ <a href="https://horizonwind2004.github.io/" target="_blank">Ji Xie</a>&emsp;
12
+ <a href="https://yulu.net.cn/" target="_blank">Yu Lu</a>&emsp;
13
+ <a href="https://z-x-yang.github.io/" target="_blank">Zongxin Yang</a>&emsp;
14
+ <a href="https://scholar.google.com/citations?user=RMSuNFwAAAAJ&hl=zh-CN&oi=ao" target="_blank">Yi Yang✉</a>&emsp;
15
+ </div>
16
+ <div>
17
+ ReLER, CCAI, Zhejiang University; Harvard University
18
+ </div>
19
+ <div>
20
+ <sup>✉</sup>Corresponding Author
21
+ </div>
22
+ <div>
23
+ <a href="https://arxiv.org/abs/2504.20690" target="_blank">Arxiv</a>&emsp;
24
+ <a href="https://river-zhang.github.io/ICEdit-gh-pages/" target="_blank">Project Page</a>
25
+ </div>
26
+
27
+
28
+ <div style="width: 80%; margin:auto;">
29
+ <img style="width:100%; display: block; margin: auto;" src="docs/images/teaser.png">
30
+ <p style="text-align: left;">We present In-Context Edit, a novel approach that achieves state-of-the-art instruction-based editing <b>using just 0.5% of the training data and 1% of the parameters required by prior SOTA methods</b>. The first row illustrates a series of multi-turn edits, executed with high precision, while the second and third rows highlight diverse, visually impressive single-turn editing results from our method.</p>
31
+ </div>
32
+
33
+ :open_book: For more visual results, go checkout our <a href="https://river-zhang.github.io/ICEdit-gh-pages/" target="_blank">project page</a>
34
+
35
+ This repository will contain the official implementation of _ICEdit_.
36
+
37
+
38
+ <div align="left">
39
+
40
+ # To Do List
41
+
42
+ - [x] Inference Code
43
+ - [ ] Inference-time Scaling with VLM
44
+ - [x] Pretrained Weights
45
+ - [ ] More Inference Demos
46
+ - [x] Gradio demo
47
+ - [ ] Comfy UI demo
48
+ - [ ] Training Code
49
+
50
+ # News
51
+ - **[2025/4/30]** 🔥 We release the inference code and [pretrained weights](https://huggingface.co/sanaka87/ICEdit-MoE-LoRA/tree/main) on Huggingface 🤗!
52
+ - **[2025/4/30]** 🔥 We release the [paper](https://arxiv.org/abs/2504.20690) on arXiv!
53
+ - **[2025/4/29]** We release the [project page](https://river-zhang.github.io/ICEdit-gh-pages/) and demo video! Codes will be made available in next week~ Happy Labor Day!
54
+
55
+
56
+ # Installation
57
+
58
+ ## Conda environment setup
59
+
60
+ ```bash
61
+ conda create -n icedit python=3.10
62
+ conda activate icedit
63
+ pip install -r requirements.txt
64
+ ```
65
+
66
+ ## Download pretrained weights
67
+
68
+ If you can connect to Huggingface, you don't need to download the weights. Otherwise, you need to download the weights to local.
69
+
70
+ - [Flux.1-fill-dev](https://huggingface.co/black-forest-labs/flux.1-fill-dev).
71
+ - [ICEdit-MoE-LoRA](https://huggingface.co/sanaka87/ICEdit-MoE-LoRA).
72
+
73
+ ## Inference in bash (w/o VLM Inference-time Scaling)
74
+
75
+ Now you can have a try!
76
+
77
+ > Our model can **only edit images with a width of 512 pixels** (there is no restriction on the height). If you pass in an image with a width other than 512 pixels, the model will automatically resize it to 512 pixels.
78
+
79
+ > If you found the model failed to generate the expected results, please try to change the `--seed` parameter. Inference-time Scaling with VLM can help much to improve the results.
80
+
81
+ ```bash
82
+ python scripts/inference.py --image assets/girl.png \
83
+ --instruction "Make her hair dark green and her clothes checked." \
84
+ --seed 42 \
85
+ ```
86
+
87
+ Editing a 512×768 image requires 35 GB of GPU memory. If you need to run on a system with 24 GB of GPU memory (for example, an NVIDIA RTX3090), you can add the `--enable-model-cpu-offload` parameter.
88
+
89
+ ```bash
90
+ python scripts/inference.py --image assets/girl.png \
91
+ --instruction "Make her hair dark green and her clothes checked." \
92
+ --enable-model-cpu-offload
93
+ ```
94
+
95
+ If you have downloaded the pretrained weights locally, please pass the parameters during inference, as in:
96
+
97
+ ```bash
98
+ python scripts/inference.py --image assets/girl.png \
99
+ --instruction "Make her hair dark green and her clothes checked." \
100
+ --flux-path /path/to/flux.1-fill-dev \
101
+ --lora-path /path/to/ICEdit-MoE-LoRA
102
+ ```
103
+
104
+ ## Inference in Gradio Demo
105
+
106
+ We provide a gradio demo for you to edit images in a more user-friendly way. You can run the following command to start the demo.
107
+
108
+ ```bash
109
+ python scripts/gradio_demo.py --port 7860
110
+ ```
111
+
112
+ Like the inference script, if you want to run the demo on a system with 24 GB of GPU memory, you can add the `--enable-model-cpu-offload` parameter. And if you have downloaded the pretrained weights locally, please pass the parameters during inference, as in:
113
+
114
+ ```bash
115
+ python scripts/gradio_demo.py --port 7860 \
116
+ --flux-path /path/to/flux.1-fill-dev (optional) \
117
+ --lora-path /path/to/ICEdit-MoE-LoRA (optional) \
118
+ --enable-model-cpu-offload (optional) \
119
+ ```
120
+
121
+ Then you can open the link in your browser to edit images.
122
+
123
+ ### 🎨 Enjoy your editing!
124
+
125
+
126
+
127
+ # Comparison with Commercial Models
128
+
129
+ <div align="center">
130
+ <div style="width: 80%; text-align: left; margin:auto;">
131
+ <img style="width:100%" src="docs/images/gpt4o_comparison.png">
132
+ <p style="text-align: left;">Compared with commercial models such as Gemini and GPT-4o, our methods are comparable to and even superior to these commercial models in terms of character ID preservation and instruction following. <b>We are more open-source than them, with lower costs, faster speed (it takes about 9 seconds to process one image), and powerful performance</b>.</p>
133
+ </div>
134
+
135
+
136
+ <div align="left">
137
+
138
+
139
+ # Bibtex
140
+ If this work is helpful for your research, please consider citing the following BibTeX entry.
141
+
142
+ ```
143
+ @misc{zhang2025ICEdit,
144
+ title={In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer},
145
+ author={Zechuan Zhang and Ji Xie and Yu Lu and Zongxin Yang and Yi Yang},
146
+ year={2025},
147
+ eprint={2504.20690},
148
+ archivePrefix={arXiv},
149
+ primaryClass={cs.CV},
150
+ url={https://arxiv.org/abs/2504.20690},
151
+ }
152
+ ```