Divyasreepat commited on
Commit
877cf88
·
verified ·
1 Parent(s): 97dd56b

Update README.md with new model card content

Browse files
Files changed (1) hide show
  1. README.md +279 -0
README.md CHANGED
@@ -3,5 +3,284 @@ library_name: keras-hub
3
  pipeline_tag: text-to-image
4
  ---
5
  ### Model Overview
 
6
 
 
7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  pipeline_tag: text-to-image
4
  ---
5
  ### Model Overview
6
+ [Stable Diffusion 3.5 ](https://stability.ai/learning-hub/stable-diffusion-3-5-prompt-guide) is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
7
 
8
+ For more technical details, please refer to the [Research paper](https://stability.ai/news/stable-diffusion-3-research-paper).
9
 
10
+ Please note: this model is released under the Stability Community License. For Enterprise License visit Stability.ai or [contact us](https://stability.ai/enterprise) for commercial licensing details.
11
+
12
+ ## Links
13
+
14
+ * [SD3.5 Quickstart Notebook ](https://colab.sandbox.google.com/gist/laxmareddyp/55daf77f87730c3b3f498318672f70b3/stablediffusion3_5-quckstart-notebook.ipynb)
15
+ * [SD3.5 API Documentation](https://keras.io/keras_hub/api/models/stable_diffusion_3/)
16
+ * [SD3.5 Model Card](https://huggingface.co/stabilityai/stable-diffusion-3.5-large)
17
+ * [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
18
+ * [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)
19
+
20
+ ## Presets
21
+
22
+ The following model checkpoints are provided by the Keras team. Full code examples for each are available below.
23
+ | Preset name | Parameters | Description |
24
+ |----------------|------------|--------------------------------------------------|
25
+ | stable_diffusion_3.5_large| 9.05B | 9 billion parameter, including CLIP L and CLIP G text encoders, MMDiT generative model, and VAE autoencoder. Developed by Stability AI.|
26
+ | stable_diffusion_3.5_large_turbo | 9.05B | 9 billion parameter, including CLIP L and CLIP G text encoders, MMDiT generative model, and VAE autoencoder. A timestep-distilled version that eliminates classifier-free guidance and uses fewer steps for generation. Developed by Stability AI. |
27
+
28
+ ### Model Description
29
+
30
+ - **Developed by:** Stability AI
31
+ - **Model type:** MMDiT text-to-image generative model
32
+ - **Model Description:** This is a model that can be used to generate images based on text prompts. It is a [Multimodal Diffusion Transformer](https://arxiv.org/abs/2403.03206)
33
+ that uses three fixed, pretrained text encoders (OpenCLIP-ViT/G, CLIP-ViT/L and T5-xxl), and QK-normalization to improve training stability.
34
+
35
+ ## Example Usage
36
+ ```python
37
+ !pip install -U keras-hub
38
+ !pip install -U keras
39
+ ```
40
+
41
+ ```
42
+ # Pretrained Stable Diffusion 3 model.
43
+ model = keras_hub.models.StableDiffusion3Backbone.from_preset(
44
+ "stable_diffusion_3.5_large_turbo"
45
+ )
46
+
47
+ # Randomly initialized Stable Diffusion 3 model with custom config.
48
+ vae = keras_hub.models.VAEBackbone(...)
49
+ clip_l = keras_hub.models.CLIPTextEncoder(...)
50
+ clip_g = keras_hub.models.CLIPTextEncoder(...)
51
+ model = keras_hub.models.StableDiffusion3Backbone(
52
+ mmdit_patch_size=2,
53
+ mmdit_num_heads=4,
54
+ mmdit_hidden_dim=256,
55
+ mmdit_depth=4,
56
+ mmdit_position_size=192,
57
+ vae=vae,
58
+ clip_l=clip_l,
59
+ clip_g=clip_g,
60
+ )
61
+
62
+ # Image to image example
63
+ image_to_image = keras_hub.models.StableDiffusion3ImageToImage.from_preset(
64
+ "stable_diffusion_3.5_large_turbo", height=512, width=512
65
+ )
66
+ image_to_image.generate(
67
+ {
68
+ "images": np.ones((512, 512, 3), dtype="float32"),
69
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
70
+ }
71
+ )
72
+
73
+ # Generate with batched prompts.
74
+ image_to_image.generate(
75
+ {
76
+ "images": np.ones((2, 512, 512, 3), dtype="float32"),
77
+ "prompts": ["cute wallpaper art of a cat", "cute wallpaper art of a dog"],
78
+ }
79
+ )
80
+
81
+ # Generate with different `num_steps`, `guidance_scale` and `strength`.
82
+ image_to_image.generate(
83
+ {
84
+ "images": np.ones((512, 512, 3), dtype="float32"),
85
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
86
+ }
87
+ num_steps=50,
88
+ guidance_scale=5.0,
89
+ strength=0.6,
90
+ )
91
+
92
+ # Generate with `negative_prompts`.
93
+ text_to_image.generate(
94
+ {
95
+ "images": np.ones((512, 512, 3), dtype="float32"),
96
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
97
+ "negative_prompts": "green color",
98
+ }
99
+ )
100
+
101
+ # inpainting example
102
+ reference_image = np.ones((1024, 1024, 3), dtype="float32")
103
+ reference_mask = np.ones((1024, 1024), dtype="float32")
104
+ inpaint = keras_hub.models.StableDiffusion3Inpaint.from_preset(
105
+ "stable_diffusion_3.5_large_turbo", height=512, width=512
106
+ )
107
+ inpaint.generate(
108
+ reference_image,
109
+ reference_mask,
110
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
111
+ )
112
+
113
+ # Generate with batched prompts.
114
+ reference_images = np.ones((2, 512, 512, 3), dtype="float32")
115
+ reference_mask = np.ones((2, 1024, 1024), dtype="float32")
116
+ inpaint.generate(
117
+ reference_images,
118
+ reference_mask,
119
+ ["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
120
+ )
121
+
122
+ # Generate with different `num_steps`, `guidance_scale` and `strength`.
123
+ inpaint.generate(
124
+ reference_image,
125
+ reference_mask,
126
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
127
+ num_steps=50,
128
+ guidance_scale=5.0,
129
+ strength=0.6,
130
+ )
131
+
132
+ # text to image example
133
+ text_to_image = keras_hub.models.StableDiffusion3TextToImage.from_preset(
134
+ "stable_diffusion_3.5_large_turbo", height=512, width=512
135
+ )
136
+ text_to_image.generate(
137
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
138
+ )
139
+
140
+ # Generate with batched prompts.
141
+ text_to_image.generate(
142
+ ["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
143
+ )
144
+
145
+ # Generate with different `num_steps` and `guidance_scale`.
146
+ text_to_image.generate(
147
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
148
+ num_steps=50,
149
+ guidance_scale=5.0,
150
+ )
151
+
152
+ # Generate with `negative_prompts`.
153
+ text_to_image.generate(
154
+ {
155
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
156
+ "negative_prompts": "green color",
157
+ }
158
+ )
159
+ ```
160
+
161
+ ## Example Usage with Hugging Face URI
162
+
163
+ ```python
164
+ !pip install -U keras-hub
165
+ !pip install -U keras
166
+ ```
167
+
168
+ ```
169
+ # Pretrained Stable Diffusion 3 model.
170
+ model = keras_hub.models.StableDiffusion3Backbone.from_preset(
171
+ "hf://keras/stable_diffusion_3.5_large_turbo"
172
+ )
173
+
174
+ # Randomly initialized Stable Diffusion 3 model with custom config.
175
+ vae = keras_hub.models.VAEBackbone(...)
176
+ clip_l = keras_hub.models.CLIPTextEncoder(...)
177
+ clip_g = keras_hub.models.CLIPTextEncoder(...)
178
+ model = keras_hub.models.StableDiffusion3Backbone(
179
+ mmdit_patch_size=2,
180
+ mmdit_num_heads=4,
181
+ mmdit_hidden_dim=256,
182
+ mmdit_depth=4,
183
+ mmdit_position_size=192,
184
+ vae=vae,
185
+ clip_l=clip_l,
186
+ clip_g=clip_g,
187
+ )
188
+
189
+ # Image to image example
190
+ image_to_image = keras_hub.models.StableDiffusion3ImageToImage.from_preset(
191
+ "hf://keras/stable_diffusion_3.5_large_turbo", height=512, width=512
192
+ )
193
+ image_to_image.generate(
194
+ {
195
+ "images": np.ones((512, 512, 3), dtype="float32"),
196
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
197
+ }
198
+ )
199
+
200
+ # Generate with batched prompts.
201
+ image_to_image.generate(
202
+ {
203
+ "images": np.ones((2, 512, 512, 3), dtype="float32"),
204
+ "prompts": ["cute wallpaper art of a cat", "cute wallpaper art of a dog"],
205
+ }
206
+ )
207
+
208
+ # Generate with different `num_steps`, `guidance_scale` and `strength`.
209
+ image_to_image.generate(
210
+ {
211
+ "images": np.ones((512, 512, 3), dtype="float32"),
212
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
213
+ }
214
+ num_steps=50,
215
+ guidance_scale=5.0,
216
+ strength=0.6,
217
+ )
218
+
219
+ # Generate with `negative_prompts`.
220
+ text_to_image.generate(
221
+ {
222
+ "images": np.ones((512, 512, 3), dtype="float32"),
223
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
224
+ "negative_prompts": "green color",
225
+ }
226
+ )
227
+
228
+ # inpainting example
229
+ reference_image = np.ones((1024, 1024, 3), dtype="float32")
230
+ reference_mask = np.ones((1024, 1024), dtype="float32")
231
+ inpaint = keras_hub.models.StableDiffusion3Inpaint.from_preset(
232
+ "hf://keras/stable_diffusion_3.5_large_turbo", height=512, width=512
233
+ )
234
+ inpaint.generate(
235
+ reference_image,
236
+ reference_mask,
237
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
238
+ )
239
+
240
+ # Generate with batched prompts.
241
+ reference_images = np.ones((2, 512, 512, 3), dtype="float32")
242
+ reference_mask = np.ones((2, 1024, 1024), dtype="float32")
243
+ inpaint.generate(
244
+ reference_images,
245
+ reference_mask,
246
+ ["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
247
+ )
248
+
249
+ # Generate with different `num_steps`, `guidance_scale` and `strength`.
250
+ inpaint.generate(
251
+ reference_image,
252
+ reference_mask,
253
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
254
+ num_steps=50,
255
+ guidance_scale=5.0,
256
+ strength=0.6,
257
+ )
258
+
259
+ # text to image example
260
+ text_to_image = keras_hub.models.StableDiffusion3TextToImage.from_preset(
261
+ "hf://keras/stable_diffusion_3.5_large_turbo", height=512, width=512
262
+ )
263
+ text_to_image.generate(
264
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
265
+ )
266
+
267
+ # Generate with batched prompts.
268
+ text_to_image.generate(
269
+ ["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
270
+ )
271
+
272
+ # Generate with different `num_steps` and `guidance_scale`.
273
+ text_to_image.generate(
274
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
275
+ num_steps=50,
276
+ guidance_scale=5.0,
277
+ )
278
+
279
+ # Generate with `negative_prompts`.
280
+ text_to_image.generate(
281
+ {
282
+ "prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
283
+ "negative_prompts": "green color",
284
+ }
285
+ )
286
+ ```