stable_diffusion_3.5_large_turbo / README.md

Update README.md with new model card content

877cf88 verified 2 months ago

9.65 kB

	---
	library_name: keras-hub
	pipeline_tag: text-to-image
	---
	### Model Overview
	[Stable Diffusion 3.5 ](https://stability.ai/learning-hub/stable-diffusion-3-5-prompt-guide) is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.

	For more technical details, please refer to the [Research paper](https://stability.ai/news/stable-diffusion-3-research-paper).

	Please note: this model is released under the Stability Community License. For Enterprise License visit Stability.ai or [contact us](https://stability.ai/enterprise) for commercial licensing details.

	## Links

	* [SD3.5 Quickstart Notebook ](https://colab.sandbox.google.com/gist/laxmareddyp/55daf77f87730c3b3f498318672f70b3/stablediffusion3_5-quckstart-notebook.ipynb)
	* [SD3.5 API Documentation](https://keras.io/keras_hub/api/models/stable_diffusion_3/)
	* [SD3.5 Model Card](https://huggingface.co/stabilityai/stable-diffusion-3.5-large)
	* [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
	* [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)

	## Presets

	The following model checkpoints are provided by the Keras team. Full code examples for each are available below.
	\| Preset name \| Parameters \| Description \|
	\|----------------\|------------\|--------------------------------------------------\|
	\| stable_diffusion_3.5_large\| 9.05B \| 9 billion parameter, including CLIP L and CLIP G text encoders, MMDiT generative model, and VAE autoencoder. Developed by Stability AI.\|
	\| stable_diffusion_3.5_large_turbo \| 9.05B \| 9 billion parameter, including CLIP L and CLIP G text encoders, MMDiT generative model, and VAE autoencoder. A timestep-distilled version that eliminates classifier-free guidance and uses fewer steps for generation. Developed by Stability AI. \|

	### Model Description

	- Developed by: Stability AI
	- Model type: MMDiT text-to-image generative model
	- Model Description: This is a model that can be used to generate images based on text prompts. It is a [Multimodal Diffusion Transformer](https://arxiv.org/abs/2403.03206)
	that uses three fixed, pretrained text encoders (OpenCLIP-ViT/G, CLIP-ViT/L and T5-xxl), and QK-normalization to improve training stability.

	## Example Usage
	```python
	!pip install -U keras-hub
	!pip install -U keras
	```

	```
	# Pretrained Stable Diffusion 3 model.
	model = keras_hub.models.StableDiffusion3Backbone.from_preset(
	"stable_diffusion_3.5_large_turbo"
	)

	# Randomly initialized Stable Diffusion 3 model with custom config.
	vae = keras_hub.models.VAEBackbone(...)
	clip_l = keras_hub.models.CLIPTextEncoder(...)
	clip_g = keras_hub.models.CLIPTextEncoder(...)
	model = keras_hub.models.StableDiffusion3Backbone(
	mmdit_patch_size=2,
	mmdit_num_heads=4,
	mmdit_hidden_dim=256,
	mmdit_depth=4,
	mmdit_position_size=192,
	vae=vae,
	clip_l=clip_l,
	clip_g=clip_g,
	)

	# Image to image example
	image_to_image = keras_hub.models.StableDiffusion3ImageToImage.from_preset(
	"stable_diffusion_3.5_large_turbo", height=512, width=512
	)
	image_to_image.generate(
	{
	"images": np.ones((512, 512, 3), dtype="float32"),
	"prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
	}
	)

	# Generate with batched prompts.
	image_to_image.generate(
	{
	"images": np.ones((2, 512, 512, 3), dtype="float32"),
	"prompts": ["cute wallpaper art of a cat", "cute wallpaper art of a dog"],
	}
	)

	# Generate with different `num_steps`, `guidance_scale` and `strength`.
	image_to_image.generate(
	{
	"images": np.ones((512, 512, 3), dtype="float32"),
	"prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
	}
	num_steps=50,
	guidance_scale=5.0,
	strength=0.6,
	)

	# Generate with `negative_prompts`.
	text_to_image.generate(
	{
	"images": np.ones((512, 512, 3), dtype="float32"),
	"prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
	"negative_prompts": "green color",
	}
	)

	# inpainting example
	reference_image = np.ones((1024, 1024, 3), dtype="float32")
	reference_mask = np.ones((1024, 1024), dtype="float32")
	inpaint = keras_hub.models.StableDiffusion3Inpaint.from_preset(
	"stable_diffusion_3.5_large_turbo", height=512, width=512
	)
	inpaint.generate(
	reference_image,
	reference_mask,
	"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
	)

	# Generate with batched prompts.
	reference_images = np.ones((2, 512, 512, 3), dtype="float32")
	reference_mask = np.ones((2, 1024, 1024), dtype="float32")
	inpaint.generate(
	reference_images,
	reference_mask,
	["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
	)

	# Generate with different `num_steps`, `guidance_scale` and `strength`.
	inpaint.generate(
	reference_image,
	reference_mask,
	"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
	num_steps=50,
	guidance_scale=5.0,
	strength=0.6,
	)

	# text to image example
	text_to_image = keras_hub.models.StableDiffusion3TextToImage.from_preset(
	"stable_diffusion_3.5_large_turbo", height=512, width=512
	)
	text_to_image.generate(
	"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
	)

	# Generate with batched prompts.
	text_to_image.generate(
	["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
	)

	# Generate with different `num_steps` and `guidance_scale`.
	text_to_image.generate(
	"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
	num_steps=50,
	guidance_scale=5.0,
	)

	# Generate with `negative_prompts`.
	text_to_image.generate(
	{
	"prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
	"negative_prompts": "green color",
	}
	)
	```

	## Example Usage with Hugging Face URI

	```python
	!pip install -U keras-hub
	!pip install -U keras
	```

	```
	# Pretrained Stable Diffusion 3 model.
	model = keras_hub.models.StableDiffusion3Backbone.from_preset(
	"hf://keras/stable_diffusion_3.5_large_turbo"
	)

	# Randomly initialized Stable Diffusion 3 model with custom config.
	vae = keras_hub.models.VAEBackbone(...)
	clip_l = keras_hub.models.CLIPTextEncoder(...)
	clip_g = keras_hub.models.CLIPTextEncoder(...)
	model = keras_hub.models.StableDiffusion3Backbone(
	mmdit_patch_size=2,
	mmdit_num_heads=4,
	mmdit_hidden_dim=256,
	mmdit_depth=4,
	mmdit_position_size=192,
	vae=vae,
	clip_l=clip_l,
	clip_g=clip_g,
	)

	# Image to image example
	image_to_image = keras_hub.models.StableDiffusion3ImageToImage.from_preset(
	"hf://keras/stable_diffusion_3.5_large_turbo", height=512, width=512
	)
	image_to_image.generate(
	{
	"images": np.ones((512, 512, 3), dtype="float32"),
	"prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
	}
	)

	# Generate with batched prompts.
	image_to_image.generate(
	{
	"images": np.ones((2, 512, 512, 3), dtype="float32"),
	"prompts": ["cute wallpaper art of a cat", "cute wallpaper art of a dog"],
	}
	)

	# Generate with different `num_steps`, `guidance_scale` and `strength`.
	image_to_image.generate(
	{
	"images": np.ones((512, 512, 3), dtype="float32"),
	"prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
	}
	num_steps=50,
	guidance_scale=5.0,
	strength=0.6,
	)

	# Generate with `negative_prompts`.
	text_to_image.generate(
	{
	"images": np.ones((512, 512, 3), dtype="float32"),
	"prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
	"negative_prompts": "green color",
	}
	)

	# inpainting example
	reference_image = np.ones((1024, 1024, 3), dtype="float32")
	reference_mask = np.ones((1024, 1024), dtype="float32")
	inpaint = keras_hub.models.StableDiffusion3Inpaint.from_preset(
	"hf://keras/stable_diffusion_3.5_large_turbo", height=512, width=512
	)
	inpaint.generate(
	reference_image,
	reference_mask,
	"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
	)

	# Generate with batched prompts.
	reference_images = np.ones((2, 512, 512, 3), dtype="float32")
	reference_mask = np.ones((2, 1024, 1024), dtype="float32")
	inpaint.generate(
	reference_images,
	reference_mask,
	["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
	)

	# Generate with different `num_steps`, `guidance_scale` and `strength`.
	inpaint.generate(
	reference_image,
	reference_mask,
	"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
	num_steps=50,
	guidance_scale=5.0,
	strength=0.6,
	)

	# text to image example
	text_to_image = keras_hub.models.StableDiffusion3TextToImage.from_preset(
	"hf://keras/stable_diffusion_3.5_large_turbo", height=512, width=512
	)
	text_to_image.generate(
	"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
	)

	# Generate with batched prompts.
	text_to_image.generate(
	["cute wallpaper art of a cat", "cute wallpaper art of a dog"]
	)

	# Generate with different `num_steps` and `guidance_scale`.
	text_to_image.generate(
	"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
	num_steps=50,
	guidance_scale=5.0,
	)

	# Generate with `negative_prompts`.
	text_to_image.generate(
	{
	"prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
	"negative_prompts": "green color",
	}
	)
	```