Spaces:

ysharma
/

Low-rank-Adaptation

Configuration error

ysharma HF Staff commited on Dec 10, 2022

Commit

5d28775

1 Parent(s): 47e5f02

Upload files

Browse files

Files changed (19) hide show

.gitattributes +2 -0
.gitignore +6 -0
README.md +139 -13
contents/alpha_scale.gif +3 -0
contents/alpha_scale.mp4 +3 -0
contents/disney_lora.jpg +0 -0
contents/pop_art.jpg +0 -0
lora_diffusion/__init__.py +1 -0
lora_diffusion/cli_lora_add.py +49 -0
lora_diffusion/lora.py +166 -0
lora_disney.pt +3 -0
lora_illust.pt +3 -0
lora_pop.pt +3 -0
requirements.txt +3 -3
run_lora_db.sh +17 -0
scripts/make_alpha_gifs.ipynb +0 -0
scripts/run_inference.ipynb +0 -0
setup.py +25 -0
train_lora_dreambooth.py +964 -0

.gitattributes CHANGED Viewed

@@ -32,3 +32,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+contents/alpha_scale.gif filter=lfs diff=lfs merge=lfs -text
+contents/alpha_scale.mp4 filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,6 @@

+data_*
+output_*
+__pycache__
+*.pyc
+__test*
+merged_lora*

README.md CHANGED Viewed

@@ -1,13 +1,139 @@
----
-title: Low Rank Adaptation
-emoji: 🐨
-colorFrom: red
-colorTo: yellow
-sdk: gradio
-sdk_version: 3.12.0
-app_file: app.py
-pinned: false
-license: mit
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Low-rank Adaptation for Fast Text-to-Image Diffusion Fine-tuning
+<!-- #region -->
+<p align="center">
+<img  src="contents/alpha_scale.gif">
+</p>
+<!-- #endregion -->
+> Using LORA to fine tune on illustration dataset : $W = W_0 + \alpha \Delta W$, where $\alpha$ is the merging ratio. Above gif is scaling alpha from 0 to 1. Setting alpha to 0 is same as using the original model, and setting alpha to 1 is same as using the fully fine-tuned model.
+<!-- #region -->
+<p align="center">
+<img  src="contents/disney_lora.jpg">
+</p>
+<!-- #endregion -->
+> "style of sks, baby lion", with disney-style LORA model.
+<!-- #region -->
+<p align="center">
+<img  src="contents/pop_art.jpg">
+</p>
+<!-- #endregion -->
+> "style of sks, superman", with pop-art style LORA model.
+## Main Features
+- Fine-tune Stable diffusion models twice as faster than dreambooth method, by Low-rank Adaptation
+- Get insanely small end result, easy to share and download.
+- Easy to use, compatible with diffusers
+- Sometimes even better performance than full fine-tuning (but left as future work for extensive comparisons)
+- Merge checkpoints by merging LORA
+# Lengthy Introduction
+Thanks to the generous work of Stability AI and Huggingface, so many people have enjoyed fine-tuning stable diffusion models to fit their needs and generate higher fidelity images. **However, the fine-tuning process is very slow, and it is not easy to find a good balance between the number of steps and the quality of the results.**
+Also, the final results (fully fined-tuned model) is very large. Some people instead works with textual-inversion as an alternative for this. But clearly this is suboptimal: textual inversion only creates a small word-embedding, and the final image is not as good as a fully fine-tuned model.
+Well, what's the alternative? In the domain of LLM, researchers have developed Efficient fine-tuning methods. LORA, especially, tackles the very problem the community currently has: end users with Open-sourced stable-diffusion model want to try various other fine-tuned model that is created by the community, but the model is too large to download and use. LORA instead attempts to fine-tune the "residual" of the model instead of the entire model: i.e., train the $\Delta W$ instead of $W$.
+$$
+W' = W + \Delta W
+$$
+Where we can further decompose $\Delta W$ into low-rank matrices : $\Delta W = A B^T $, where $A, \in \mathbb{R}^{n \times d}, B \in \mathbb{R}^{m \times d}, d << n$.
+This is the key idea of LORA. We can then fine-tune $A$ and $B$ instead of $W$. In the end, you get an insanely small model as $A$ and $B$ are much smaller than $W$.
+Also, not all of the parameters need tuning: they found that often, $Q, K, V, O$ (i.e., attention layer) of the transformer model is enough to tune. (This is also the reason why the end result is so small). This repo will follow the same idea.
+Enough of the lengthy introduction, let's get to the code.
+# Installation
+```bash
+pip install git+https://github.com/cloneofsimo/lora.git
+```
+# Getting Started
+## Fine-tuning Stable diffusion with LORA.
+Basic usage is as follows: prepare sets of $A, B$ matrices in an unet model, and fine-tune them.
+```python
+from lora_diffusion import inject_trainable_lora, extract_lora_up_downs
+...
+unet = UNet2DConditionModel.from_pretrained(
+    pretrained_model_name_or_path,
+    subfolder="unet",
+)
+unet.requires_grad_(False)
+unet_lora_params, train_names = inject_trainable_lora(unet)  # This will
+# turn off all of the gradients of unet, except for the trainable LORA params.
+optimizer = optim.Adam(
+    itertools.chain(*unet_lora_params, text_encoder.parameters()), lr=1e-4
+)
+```
+An example of this can be found in `train_lora_dreambooth.py`. Run this example with
+```bash
+run_lora_db.sh
+```
+## Loading, merging, and interpolating trained LORAs.
+We've seen that people have been merging different checkpoints with different ratios, and this seems to be very useful to the community. LORA is extremely easy to merge.
+By the nature of LORA, one can interpolate between different fine-tuned models by adding different $A, B$ matrices.
+Currently, LORA cli has two options : merge unet with LORA, or merge LORA with LORA.
+### Merging unet with LORA
+```bash
+$ lora_add --path_1 PATH_TO_DIFFUSER_FORMAT_MODEL --path_2 PATH_TO_LORA.PT --mode upl --alpha 1.0 --output_path OUTPUT_PATH
+```
+`path_1` can be both local path or huggingface model name. When adding LORA to unet, alpha is the constant as below:
+$$
+W' = W + \alpha \Delta W
+$$
+So, set alpha to 1.0 to fully add LORA. If the LORA seems to have too much effect (i.e., overfitted), set alpha to lower value. If the LORA seems to have too little effect, set alpha to higher than 1.0. You can tune these values to your needs.
+**Example**
+```bash
+$ lora_add --path_1 stabilityai/stable-diffusion-2-base --path_2 lora_illust.pt --mode upl --alpha 1.0 --output_path merged_model
+```
+### Merging LORA with LORA
+```bash
+$ lora_add --path_1 PATH_TO_LORA.PT --path_2 PATH_TO_LORA.PT --mode lpl --alpha 0.5 --output_path OUTPUT_PATH.PT
+```
+alpha is the ratio of the first model to the second model. i.e.,
+$$
+\Delta W = (\alpha A_1 + (1 - \alpha) A_2) (B_1 + (1 - \alpha) B_2)^T
+$$
+Set alpha to 0.5 to get the average of the two models. Set alpha close to 1.0 to get more effect of the first model, and set alpha close to 0.0 to get more effect of the second model.
+**Example**
+```bash
+$ lora_add --path_1 lora_illust.pt --path_2 lora_pop.pt --alpha 0.3 --output_path lora_merged.pt
+```
+### Making Inference with trained LORA
+Checkout `scripts/run_inference.ipynb` for an example of how to make inference with LORA.

contents/alpha_scale.gif ADDED Viewed

Git LFS Details

SHA256: 43e9966f27a2b9823956545970d3b2ed5b2f376a1dab5d653f21a977a919e164
Pointer size: 132 Bytes
Size of remote file: 5.23 MB

contents/alpha_scale.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1ad74f5f69d99bfcbeee1d4d2b3900ac1ca7ff83fba5ddf8269ffed8a56c9c6e
+size 5247140

contents/disney_lora.jpg ADDED Viewed

contents/pop_art.jpg ADDED Viewed

lora_diffusion/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ from .lora import *

lora_diffusion/cli_lora_add.py ADDED Viewed

	@@ -0,0 +1,49 @@

+from typing import Literal, Union, Dict
+import fire
+from diffusers import StableDiffusionPipeline
+import torch
+from .lora import tune_lora_scale, weight_apply_lora
+def add(
+    path_1: str,
+    path_2: str,
+    output_path: str = "./merged_lora.pt",
+    alpha: float = 0.5,
+    mode: Literal["lpl", "upl"] = "lpl",
+):
+    if mode == "lpl":
+        out_list = []
+        l1 = torch.load(path_1)
+        l2 = torch.load(path_2)
+        l1pairs = zip(l1[::2], l1[1::2])
+        l2pairs = zip(l2[::2], l2[1::2])
+        for (x1, y1), (x2, y2) in zip(l1pairs, l2pairs):
+            x1.data = alpha * x1.data + (1 - alpha) * x2.data
+            y1.data = alpha * y1.data + (1 - alpha) * y2.data
+            out_list.append(x1)
+            out_list.append(y1)
+        torch.save(out_list, output_path)
+    elif mode == "upl":
+        loaded_pipeline = StableDiffusionPipeline.from_pretrained(
+            path_1,
+        ).to("cpu")
+        weight_apply_lora(loaded_pipeline.unet, torch.load(path_2), alpha=alpha)
+        if output_path.endswith(".pt"):
+            output_path = output_path[:-3]
+        loaded_pipeline.save_pretrained(output_path)
+def main():
+    fire.Fire(add)

lora_diffusion/lora.py ADDED Viewed

	@@ -0,0 +1,166 @@

+import math
+from typing import Callable, Dict, List, Optional, Tuple
+import numpy as np
+import PIL
+import torch
+import torch.nn.functional as F
+import torch.nn as nn
+class LoraInjectedLinear(nn.Module):
+    def __init__(self, in_features, out_features, bias=False):
+        super().__init__()
+        self.linear = nn.Linear(in_features, out_features, bias)
+        self.lora_down = nn.Linear(in_features, 4, bias=False)
+        self.lora_up = nn.Linear(4, out_features, bias=False)
+        self.scale = 1.0
+        nn.init.normal_(self.lora_down.weight, std=1 / 16)
+        nn.init.zeros_(self.lora_up.weight)
+    def forward(self, input):
+        return self.linear(input) + self.lora_up(self.lora_down(input)) * self.scale
+def inject_trainable_lora(
+    model: nn.Module, target_replace_module: List[str] = ["CrossAttention", "Attention"]
+):
+    """
+    inject lora into model, and returns lora parameter groups.
+    """
+    require_grad_params = []
+    names = []
+    for _module in model.modules():
+        if _module.__class__.__name__ in target_replace_module:
+            for name, _child_module in _module.named_modules():
+                if _child_module.__class__.__name__ == "Linear":
+                    weight = _child_module.weight
+                    bias = _child_module.bias
+                    _tmp = LoraInjectedLinear(
+                        _child_module.in_features,
+                        _child_module.out_features,
+                        _child_module.bias is not None,
+                    )
+                    _tmp.linear.weight = weight
+                    if bias is not None:
+                        _tmp.linear.bias = bias
+                    # switch the module
+                    _module._modules[name] = _tmp
+                    require_grad_params.append(
+                        _module._modules[name].lora_up.parameters()
+                    )
+                    require_grad_params.append(
+                        _module._modules[name].lora_down.parameters()
+                    )
+                    _module._modules[name].lora_up.weight.requires_grad = True
+                    _module._modules[name].lora_down.weight.requires_grad = True
+                    names.append(name)
+    return require_grad_params, names
+def extract_lora_ups_down(model, target_replace_module=["CrossAttention", "Attention"]):
+    loras = []
+    for _module in model.modules():
+        if _module.__class__.__name__ in target_replace_module:
+            for _child_module in _module.modules():
+                if _child_module.__class__.__name__ == "LoraInjectedLinear":
+                    loras.append((_child_module.lora_up, _child_module.lora_down))
+    if len(loras) == 0:
+        raise ValueError("No lora injected.")
+    return loras
+def save_lora_weight(model, path="./lora.pt"):
+    weights = []
+    for _up, _down in extract_lora_ups_down(model):
+        weights.append(_up.weight)
+        weights.append(_down.weight)
+    torch.save(weights, path)
+def save_lora_as_json(model, path="./lora.json"):
+    weights = []
+    for _up, _down in extract_lora_ups_down(model):
+        weights.append(_up.weight.detach().cpu().numpy().tolist())
+        weights.append(_down.weight.detach().cpu().numpy().tolist())
+    import json
+    with open(path, "w") as f:
+        json.dump(weights, f)
+def weight_apply_lora(
+    model, loras, target_replace_module=["CrossAttention", "Attention"], alpha=1.0
+):
+    for _module in model.modules():
+        if _module.__class__.__name__ in target_replace_module:
+            for _child_module in _module.modules():
+                if _child_module.__class__.__name__ == "Linear":
+                    weight = _child_module.weight
+                    up_weight = loras.pop(0).detach().to(weight.device)
+                    down_weight = loras.pop(0).detach().to(weight.device)
+                    # W <- W + U * D
+                    weight = weight + alpha * (up_weight @ down_weight).type(
+                        weight.dtype
+                    )
+                    _child_module.weight = nn.Parameter(weight)
+def monkeypatch_lora(
+    model, loras, target_replace_module=["CrossAttention", "Attention"]
+):
+    for _module in model.modules():
+        if _module.__class__.__name__ in target_replace_module:
+            for name, _child_module in _module.named_modules():
+                if _child_module.__class__.__name__ == "Linear":
+                    weight = _child_module.weight
+                    bias = _child_module.bias
+                    _tmp = LoraInjectedLinear(
+                        _child_module.in_features,
+                        _child_module.out_features,
+                        _child_module.bias is not None,
+                    )
+                    _tmp.linear.weight = weight
+                    if bias is not None:
+                        _tmp.linear.bias = bias
+                    # switch the module
+                    _module._modules[name] = _tmp
+                    up_weight = loras.pop(0)
+                    down_weight = loras.pop(0)
+                    _module._modules[name].lora_up.weight = nn.Parameter(
+                        up_weight.type(weight.dtype)
+                    )
+                    _module._modules[name].lora_down.weight = nn.Parameter(
+                        down_weight.type(weight.dtype)
+                    )
+                    _module._modules[name].to(weight.device)
+def tune_lora_scale(model, alpha: float = 1.0):
+    for _module in model.modules():
+        if _module.__class__.__name__ == "LoraInjectedLinear":
+            _module.scale = alpha

lora_disney.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:72f687f810b86bb8cc64d2ece59886e2e96d29e3f57f97340ee147d168b8a5fe
+size 3397249

lora_illust.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7f6acb0bc0cd5f96299be7839f89f58727e2666e58861e55866ea02125c97aba
+size 3397249

lora_pop.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:18a1565852a08cfcff63e90670286c9427e3958f57de9b84e3f8b2c9a3a14b6c
+size 3397249

requirements.txt CHANGED Viewed

@@ -1,4 +1,4 @@
-diffusers["torch"]
 transformers
-git+https://github.com/huggingface/accelerate
-git+https://github.com/cloneofsimo/lora.git

+diffusers>=0.9.0
 transformers
+scipy
+ftfy

run_lora_db.sh ADDED Viewed

	@@ -0,0 +1,17 @@

+#https://github.com/huggingface/diffusers/tree/main/examples/dreambooth
+export MODEL_NAME="stabilityai/stable-diffusion-2-1-base"
+export INSTANCE_DIR="./data_example"
+export OUTPUT_DIR="./output_example"
+accelerate launch train_lora_dreambooth.py \
+  --pretrained_model_name_or_path=$MODEL_NAME  \
+  --instance_data_dir=$INSTANCE_DIR \
+  --output_dir=$OUTPUT_DIR \
+  --instance_prompt="style of sks" \
+  --resolution=512 \
+  --train_batch_size=1 \
+  --gradient_accumulation_steps=1 \
+  --learning_rate=1e-4 \
+  --lr_scheduler="constant" \
+  --lr_warmup_steps=0 \
+  --max_train_steps=30000

scripts/make_alpha_gifs.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

scripts/run_inference.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

setup.py ADDED Viewed

	@@ -0,0 +1,25 @@

+import os
+import pkg_resources
+from setuptools import find_packages, setup
+setup(
+    name="lora_diffusion",
+    py_modules=["lora_diffusion"],
+    version="0.0.1",
+    description="Low Rank Adaptation for Diffusion Models. Works with Stable Diffusion out-of-the-box.",
+    author="Simo Ryu",
+    packages=find_packages(),
+    entry_points={
+        "console_scripts": [
+            "lora_add = lora_diffusion.cli_lora_add:main",
+        ],
+    },
+    install_requires=[
+        str(r)
+        for r in pkg_resources.parse_requirements(
+            open(os.path.join(os.path.dirname(__file__), "requirements.txt"))
+        )
+    ],
+    include_package_data=True,
+)

train_lora_dreambooth.py ADDED Viewed

	@@ -0,0 +1,964 @@

+# Bootstrapped from:
+# https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py
+import argparse
+import hashlib
+import itertools
+import math
+import os
+from pathlib import Path
+from typing import Optional
+import torch
+import torch.nn.functional as F
+import torch.utils.checkpoint
+from accelerate import Accelerator
+from accelerate.logging import get_logger
+from accelerate.utils import set_seed
+from diffusers import (
+    AutoencoderKL,
+    DDPMScheduler,
+    StableDiffusionPipeline,
+    UNet2DConditionModel,
+)
+from diffusers.optimization import get_scheduler
+from huggingface_hub import HfFolder, Repository, whoami
+from tqdm.auto import tqdm
+from transformers import CLIPTextModel, CLIPTokenizer
+from lora_diffusion import (
+    inject_trainable_lora,
+    save_lora_weight,
+    extract_lora_ups_down,
+)
+from torch.utils.data import Dataset
+from PIL import Image
+from torchvision import transforms
+from pathlib import Path
+import random
+import re
+class DreamBoothDataset(Dataset):
+    """
+    A dataset to prepare the instance and class images with the prompts for fine-tuning the model.
+    It pre-processes the images and the tokenizes prompts.
+    """
+    def __init__(
+        self,
+        instance_data_root,
+        instance_prompt,
+        tokenizer,
+        class_data_root=None,
+        class_prompt=None,
+        size=512,
+        center_crop=False,
+    ):
+        self.size = size
+        self.center_crop = center_crop
+        self.tokenizer = tokenizer
+        self.instance_data_root = Path(instance_data_root)
+        if not self.instance_data_root.exists():
+            raise ValueError("Instance images root doesn't exists.")
+        self.instance_images_path = list(Path(instance_data_root).iterdir())
+        self.num_instance_images = len(self.instance_images_path)
+        self.instance_prompt = instance_prompt
+        self._length = self.num_instance_images
+        if class_data_root is not None:
+            self.class_data_root = Path(class_data_root)
+            self.class_data_root.mkdir(parents=True, exist_ok=True)
+            self.class_images_path = list(self.class_data_root.iterdir())
+            self.num_class_images = len(self.class_images_path)
+            self._length = max(self.num_class_images, self.num_instance_images)
+            self.class_prompt = class_prompt
+        else:
+            self.class_data_root = None
+        self.image_transforms = transforms.Compose(
+            [
+                transforms.Resize(
+                    size, interpolation=transforms.InterpolationMode.BILINEAR
+                ),
+                transforms.CenterCrop(size)
+                if center_crop
+                else transforms.RandomCrop(size),
+                transforms.ToTensor(),
+                transforms.Normalize([0.5], [0.5]),
+            ]
+        )
+    def __len__(self):
+        return self._length
+    def __getitem__(self, index):
+        example = {}
+        instance_image = Image.open(
+            self.instance_images_path[index % self.num_instance_images]
+        )
+        if not instance_image.mode == "RGB":
+            instance_image = instance_image.convert("RGB")
+        example["instance_images"] = self.image_transforms(instance_image)
+        example["instance_prompt_ids"] = self.tokenizer(
+            self.instance_prompt,
+            padding="do_not_pad",
+            truncation=True,
+            max_length=self.tokenizer.model_max_length,
+        ).input_ids
+        if self.class_data_root:
+            class_image = Image.open(
+                self.class_images_path[index % self.num_class_images]
+            )
+            if not class_image.mode == "RGB":
+                class_image = class_image.convert("RGB")
+            example["class_images"] = self.image_transforms(class_image)
+            example["class_prompt_ids"] = self.tokenizer(
+                self.class_prompt,
+                padding="do_not_pad",
+                truncation=True,
+                max_length=self.tokenizer.model_max_length,
+            ).input_ids
+        return example
+class DreamBoothLabled(Dataset):
+    """
+    A dataset to prepare the instance and class images with the prompts for fine-tuning the model.
+    It pre-processes the images and the tokenizes prompts.
+    """
+    def __init__(
+        self,
+        instance_data_root,
+        instance_prompt,
+        tokenizer,
+        class_data_root=None,
+        class_prompt=None,
+        size=512,
+        center_crop=False,
+    ):
+        self.size = size
+        self.center_crop = center_crop
+        self.tokenizer = tokenizer
+        self.instance_data_root = Path(instance_data_root)
+        if not self.instance_data_root.exists():
+            raise ValueError("Instance images root doesn't exists.")
+        self.instance_images_path = list(Path(instance_data_root).iterdir())
+        self.num_instance_images = len(self.instance_images_path)
+        self.instance_prompt = instance_prompt
+        self._length = self.num_instance_images
+        if class_data_root is not None:
+            self.class_data_root = Path(class_data_root)
+            self.class_data_root.mkdir(parents=True, exist_ok=True)
+            self.class_images_path = list(self.class_data_root.iterdir())
+            self.num_class_images = len(self.class_images_path)
+            self._length = max(self.num_class_images, self.num_instance_images)
+            self.class_prompt = class_prompt
+        else:
+            self.class_data_root = None
+        self.image_transforms = transforms.Compose(
+            [
+                transforms.Resize(
+                    size, interpolation=transforms.InterpolationMode.BILINEAR
+                ),
+                transforms.CenterCrop(size)
+                if center_crop
+                else transforms.RandomCrop(size),
+                transforms.ToTensor(),
+                transforms.Normalize([0.5], [0.5]),
+            ]
+        )
+    def __len__(self):
+        return self._length
+    def __getitem__(self, index):
+        example = {}
+        instance_image = Image.open(
+            self.instance_images_path[index % self.num_instance_images]
+        )
+        instance_prompt = (
+            str(self.instance_images_path[index % self.num_instance_images])
+            .split("/")[-1]
+            .split(".")[0]
+            .replace("-", " ")
+        )
+        # remove numbers in prompt
+        instance_prompt = re.sub(r"\d+", "", instance_prompt)
+        # print(instance_prompt)
+        _svg = random.choice(["svg", "flat color", "vector illustration", "sks"])
+        instance_prompt = f"{instance_prompt}, style of {_svg}"
+        if not instance_image.mode == "RGB":
+            instance_image = instance_image.convert("RGB")
+        example["instance_images"] = self.image_transforms(instance_image)
+        example["instance_prompt_ids"] = self.tokenizer(
+            instance_prompt,
+            padding="do_not_pad",
+            truncation=True,
+            max_length=self.tokenizer.model_max_length,
+        ).input_ids
+        if self.class_data_root:
+            class_image = Image.open(
+                self.class_images_path[index % self.num_class_images]
+            )
+            if not class_image.mode == "RGB":
+                class_image = class_image.convert("RGB")
+            example["class_images"] = self.image_transforms(class_image)
+            example["class_prompt_ids"] = self.tokenizer(
+                self.class_prompt,
+                padding="do_not_pad",
+                truncation=True,
+                max_length=self.tokenizer.model_max_length,
+            ).input_ids
+        return example
+class PromptDataset(Dataset):
+    "A simple dataset to prepare the prompts to generate class images on multiple GPUs."
+    def __init__(self, prompt, num_samples):
+        self.prompt = prompt
+        self.num_samples = num_samples
+    def __len__(self):
+        return self.num_samples
+    def __getitem__(self, index):
+        example = {}
+        example["prompt"] = self.prompt
+        example["index"] = index
+        return example
+logger = get_logger(__name__)
+def parse_args(input_args=None):
+    parser = argparse.ArgumentParser(description="Simple example of a training script.")
+    parser.add_argument(
+        "--pretrained_model_name_or_path",
+        type=str,
+        default=None,
+        required=True,
+        help="Path to pretrained model or model identifier from huggingface.co/models.",
+    )
+    parser.add_argument(
+        "--revision",
+        type=str,
+        default=None,
+        required=False,
+        help="Revision of pretrained model identifier from huggingface.co/models.",
+    )
+    parser.add_argument(
+        "--tokenizer_name",
+        type=str,
+        default=None,
+        help="Pretrained tokenizer name or path if not the same as model_name",
+    )
+    parser.add_argument(
+        "--instance_data_dir",
+        type=str,
+        default=None,
+        required=True,
+        help="A folder containing the training data of instance images.",
+    )
+    parser.add_argument(
+        "--class_data_dir",
+        type=str,
+        default=None,
+        required=False,
+        help="A folder containing the training data of class images.",
+    )
+    parser.add_argument(
+        "--instance_prompt",
+        type=str,
+        default=None,
+        required=True,
+        help="The prompt with identifier specifying the instance",
+    )
+    parser.add_argument(
+        "--class_prompt",
+        type=str,
+        default=None,
+        help="The prompt to specify images in the same class as provided instance images.",
+    )
+    parser.add_argument(
+        "--with_prior_preservation",
+        default=False,
+        action="store_true",
+        help="Flag to add prior preservation loss.",
+    )
+    parser.add_argument(
+        "--prior_loss_weight",
+        type=float,
+        default=1.0,
+        help="The weight of prior preservation loss.",
+    )
+    parser.add_argument(
+        "--num_class_images",
+        type=int,
+        default=100,
+        help=(
+            "Minimal class images for prior preservation loss. If not have enough images, additional images will be"
+            " sampled with class_prompt."
+        ),
+    )
+    parser.add_argument(
+        "--output_dir",
+        type=str,
+        default="text-inversion-model",
+        help="The output directory where the model predictions and checkpoints will be written.",
+    )
+    parser.add_argument(
+        "--seed", type=int, default=None, help="A seed for reproducible training."
+    )
+    parser.add_argument(
+        "--resolution",
+        type=int,
+        default=512,
+        help=(
+            "The resolution for input images, all the images in the train/validation dataset will be resized to this"
+            " resolution"
+        ),
+    )
+    parser.add_argument(
+        "--center_crop",
+        action="store_true",
+        help="Whether to center crop images before resizing to resolution",
+    )
+    parser.add_argument(
+        "--train_text_encoder",
+        action="store_true",
+        help="Whether to train the text encoder",
+    )
+    parser.add_argument(
+        "--train_batch_size",
+        type=int,
+        default=4,
+        help="Batch size (per device) for the training dataloader.",
+    )
+    parser.add_argument(
+        "--sample_batch_size",
+        type=int,
+        default=4,
+        help="Batch size (per device) for sampling images.",
+    )
+    parser.add_argument("--num_train_epochs", type=int, default=1)
+    parser.add_argument(
+        "--max_train_steps",
+        type=int,
+        default=None,
+        help="Total number of training steps to perform.  If provided, overrides num_train_epochs.",
+    )
+    parser.add_argument(
+        "--save_steps",
+        type=int,
+        default=500,
+        help="Save checkpoint every X updates steps.",
+    )
+    parser.add_argument(
+        "--gradient_accumulation_steps",
+        type=int,
+        default=1,
+        help="Number of updates steps to accumulate before performing a backward/update pass.",
+    )
+    parser.add_argument(
+        "--gradient_checkpointing",
+        action="store_true",
+        help="Whether or not to use gradient checkpointing to save memory at the expense of slower backward pass.",
+    )
+    parser.add_argument(
+        "--learning_rate",
+        type=float,
+        default=5e-6,
+        help="Initial learning rate (after the potential warmup period) to use.",
+    )
+    parser.add_argument(
+        "--scale_lr",
+        action="store_true",
+        default=False,
+        help="Scale the learning rate by the number of GPUs, gradient accumulation steps, and batch size.",
+    )
+    parser.add_argument(
+        "--lr_scheduler",
+        type=str,
+        default="constant",
+        help=(
+            'The scheduler type to use. Choose between ["linear", "cosine", "cosine_with_restarts", "polynomial",'
+            ' "constant", "constant_with_warmup"]'
+        ),
+    )
+    parser.add_argument(
+        "--lr_warmup_steps",
+        type=int,
+        default=500,
+        help="Number of steps for the warmup in the lr scheduler.",
+    )
+    parser.add_argument(
+        "--use_8bit_adam",
+        action="store_true",
+        help="Whether or not to use 8-bit Adam from bitsandbytes.",
+    )
+    parser.add_argument(
+        "--adam_beta1",
+        type=float,
+        default=0.9,
+        help="The beta1 parameter for the Adam optimizer.",
+    )
+    parser.add_argument(
+        "--adam_beta2",
+        type=float,
+        default=0.999,
+        help="The beta2 parameter for the Adam optimizer.",
+    )
+    parser.add_argument(
+        "--adam_weight_decay", type=float, default=1e-2, help="Weight decay to use."
+    )
+    parser.add_argument(
+        "--adam_epsilon",
+        type=float,
+        default=1e-08,
+        help="Epsilon value for the Adam optimizer",
+    )
+    parser.add_argument(
+        "--max_grad_norm", default=1.0, type=float, help="Max gradient norm."
+    )
+    parser.add_argument(
+        "--push_to_hub",
+        action="store_true",
+        help="Whether or not to push the model to the Hub.",
+    )
+    parser.add_argument(
+        "--hub_token",
+        type=str,
+        default=None,
+        help="The token to use to push to the Model Hub.",
+    )
+    parser.add_argument(
+        "--hub_model_id",
+        type=str,
+        default=None,
+        help="The name of the repository to keep in sync with the local `output_dir`.",
+    )
+    parser.add_argument(
+        "--logging_dir",
+        type=str,
+        default="logs",
+        help=(
+            "[TensorBoard](https://www.tensorflow.org/tensorboard) log directory. Will default to"
+            " *output_dir/runs/**CURRENT_DATETIME_HOSTNAME***."
+        ),
+    )
+    parser.add_argument(
+        "--mixed_precision",
+        type=str,
+        default=None,
+        choices=["no", "fp16", "bf16"],
+        help=(
+            "Whether to use mixed precision. Choose between fp16 and bf16 (bfloat16). Bf16 requires PyTorch >="
+            " 1.10.and an Nvidia Ampere GPU.  Default to the value of accelerate config of the current system or the"
+            " flag passed with the `accelerate.launch` command. Use this argument to override the accelerate config."
+        ),
+    )
+    parser.add_argument(
+        "--local_rank",
+        type=int,
+        default=-1,
+        help="For distributed training: local_rank",
+    )
+    if input_args is not None:
+        args = parser.parse_args(input_args)
+    else:
+        args = parser.parse_args()
+    env_local_rank = int(os.environ.get("LOCAL_RANK", -1))
+    if env_local_rank != -1 and env_local_rank != args.local_rank:
+        args.local_rank = env_local_rank
+    if args.with_prior_preservation:
+        if args.class_data_dir is None:
+            raise ValueError("You must specify a data directory for class images.")
+        if args.class_prompt is None:
+            raise ValueError("You must specify prompt for class images.")
+    else:
+        if args.class_data_dir is not None:
+            logger.warning(
+                "You need not use --class_data_dir without --with_prior_preservation."
+            )
+        if args.class_prompt is not None:
+            logger.warning(
+                "You need not use --class_prompt without --with_prior_preservation."
+            )
+    return args
+def get_full_repo_name(
+    model_id: str, organization: Optional[str] = None, token: Optional[str] = None
+):
+    if token is None:
+        token = HfFolder.get_token()
+    if organization is None:
+        username = whoami(token)["name"]
+        return f"{username}/{model_id}"
+    else:
+        return f"{organization}/{model_id}"
+def main(args):
+    logging_dir = Path(args.output_dir, args.logging_dir)
+    accelerator = Accelerator(
+        gradient_accumulation_steps=args.gradient_accumulation_steps,
+        mixed_precision=args.mixed_precision,
+        log_with="tensorboard",
+        logging_dir=logging_dir,
+    )
+    # Currently, it's not possible to do gradient accumulation when training two models with accelerate.accumulate
+    # This will be enabled soon in accelerate. For now, we don't allow gradient accumulation when training two models.
+    # TODO (patil-suraj): Remove this check when gradient accumulation with two models is enabled in accelerate.
+    if (
+        args.train_text_encoder
+        and args.gradient_accumulation_steps > 1
+        and accelerator.num_processes > 1
+    ):
+        raise ValueError(
+            "Gradient accumulation is not supported when training the text encoder in distributed training. "
+            "Please set gradient_accumulation_steps to 1. This feature will be supported in the future."
+        )
+    if args.seed is not None:
+        set_seed(args.seed)
+    if args.with_prior_preservation:
+        class_images_dir = Path(args.class_data_dir)
+        if not class_images_dir.exists():
+            class_images_dir.mkdir(parents=True)
+        cur_class_images = len(list(class_images_dir.iterdir()))
+        if cur_class_images < args.num_class_images:
+            torch_dtype = (
+                torch.float16 if accelerator.device.type == "cuda" else torch.float32
+            )
+            pipeline = StableDiffusionPipeline.from_pretrained(
+                args.pretrained_model_name_or_path,
+                torch_dtype=torch_dtype,
+                safety_checker=None,
+                revision=args.revision,
+            )
+            pipeline.set_progress_bar_config(disable=True)
+            num_new_images = args.num_class_images - cur_class_images
+            logger.info(f"Number of class images to sample: {num_new_images}.")
+            sample_dataset = PromptDataset(args.class_prompt, num_new_images)
+            sample_dataloader = torch.utils.data.DataLoader(
+                sample_dataset, batch_size=args.sample_batch_size
+            )
+            sample_dataloader = accelerator.prepare(sample_dataloader)
+            pipeline.to(accelerator.device)
+            for example in tqdm(
+                sample_dataloader,
+                desc="Generating class images",
+                disable=not accelerator.is_local_main_process,
+            ):
+                images = pipeline(example["prompt"]).images
+                for i, image in enumerate(images):
+                    hash_image = hashlib.sha1(image.tobytes()).hexdigest()
+                    image_filename = (
+                        class_images_dir
+                        / f"{example['index'][i] + cur_class_images}-{hash_image}.jpg"
+                    )
+                    image.save(image_filename)
+            del pipeline
+            if torch.cuda.is_available():
+                torch.cuda.empty_cache()
+    # Handle the repository creation
+    if accelerator.is_main_process:
+        if args.push_to_hub:
+            if args.hub_model_id is None:
+                repo_name = get_full_repo_name(
+                    Path(args.output_dir).name, token=args.hub_token
+                )
+            else:
+                repo_name = args.hub_model_id
+            repo = Repository(args.output_dir, clone_from=repo_name)
+            with open(os.path.join(args.output_dir, ".gitignore"), "w+") as gitignore:
+                if "step_*" not in gitignore:
+                    gitignore.write("step_*\n")
+                if "epoch_*" not in gitignore:
+                    gitignore.write("epoch_*\n")
+        elif args.output_dir is not None:
+            os.makedirs(args.output_dir, exist_ok=True)
+    # Load the tokenizer
+    if args.tokenizer_name:
+        tokenizer = CLIPTokenizer.from_pretrained(
+            args.tokenizer_name,
+            revision=args.revision,
+        )
+    elif args.pretrained_model_name_or_path:
+        tokenizer = CLIPTokenizer.from_pretrained(
+            args.pretrained_model_name_or_path,
+            subfolder="tokenizer",
+            revision=args.revision,
+        )
+    # Load models and create wrapper for stable diffusion
+    text_encoder = CLIPTextModel.from_pretrained(
+        args.pretrained_model_name_or_path,
+        subfolder="text_encoder",
+        revision=args.revision,
+    )
+    vae = AutoencoderKL.from_pretrained(
+        args.pretrained_model_name_or_path,
+        subfolder="vae",
+        revision=args.revision,
+    )
+    unet = UNet2DConditionModel.from_pretrained(
+        args.pretrained_model_name_or_path,
+        subfolder="unet",
+        revision=args.revision,
+    )
+    unet.requires_grad_(False)
+    unet_lora_params, train_names = inject_trainable_lora(unet)
+    for _up, _down in extract_lora_ups_down(unet):
+        print(_up.weight)
+        print(_down.weight)
+        break
+    vae.requires_grad_(False)
+    if not args.train_text_encoder:
+        text_encoder.requires_grad_(False)
+    if args.gradient_checkpointing:
+        unet.enable_gradient_checkpointing()
+        if args.train_text_encoder:
+            text_encoder.gradient_checkpointing_enable()
+    if args.scale_lr:
+        args.learning_rate = (
+            args.learning_rate
+            * args.gradient_accumulation_steps
+            * args.train_batch_size
+            * accelerator.num_processes
+        )
+    # Use 8-bit Adam for lower memory usage or to fine-tune the model in 16GB GPUs
+    if args.use_8bit_adam:
+        try:
+            import bitsandbytes as bnb
+        except ImportError:
+            raise ImportError(
+                "To use 8-bit Adam, please install the bitsandbytes library: `pip install bitsandbytes`."
+            )
+        optimizer_class = bnb.optim.AdamW8bit
+    else:
+        optimizer_class = torch.optim.AdamW
+    params_to_optimize = (
+        itertools.chain(*unet_lora_params, text_encoder.parameters())
+        if args.train_text_encoder
+        else itertools.chain(*unet_lora_params)
+    )
+    optimizer = optimizer_class(
+        params_to_optimize,
+        lr=args.learning_rate,
+        betas=(args.adam_beta1, args.adam_beta2),
+        weight_decay=args.adam_weight_decay,
+        eps=args.adam_epsilon,
+    )
+    noise_scheduler = DDPMScheduler.from_config(
+        args.pretrained_model_name_or_path, subfolder="scheduler"
+    )
+    train_dataset = DreamBoothDataset(
+        instance_data_root=args.instance_data_dir,
+        instance_prompt=args.instance_prompt,
+        class_data_root=args.class_data_dir if args.with_prior_preservation else None,
+        class_prompt=args.class_prompt,
+        tokenizer=tokenizer,
+        size=args.resolution,
+        center_crop=args.center_crop,
+    )
+    def collate_fn(examples):
+        input_ids = [example["instance_prompt_ids"] for example in examples]
+        pixel_values = [example["instance_images"] for example in examples]
+        # Concat class and instance examples for prior preservation.
+        # We do this to avoid doing two forward passes.
+        if args.with_prior_preservation:
+            input_ids += [example["class_prompt_ids"] for example in examples]
+            pixel_values += [example["class_images"] for example in examples]
+        pixel_values = torch.stack(pixel_values)
+        pixel_values = pixel_values.to(memory_format=torch.contiguous_format).float()
+        input_ids = tokenizer.pad(
+            {"input_ids": input_ids},
+            padding="max_length",
+            max_length=tokenizer.model_max_length,
+            return_tensors="pt",
+        ).input_ids
+        batch = {
+            "input_ids": input_ids,
+            "pixel_values": pixel_values,
+        }
+        return batch
+    train_dataloader = torch.utils.data.DataLoader(
+        train_dataset,
+        batch_size=args.train_batch_size,
+        shuffle=True,
+        collate_fn=collate_fn,
+        num_workers=1,
+    )
+    # Scheduler and math around the number of training steps.
+    overrode_max_train_steps = False
+    num_update_steps_per_epoch = math.ceil(
+        len(train_dataloader) / args.gradient_accumulation_steps
+    )
+    if args.max_train_steps is None:
+        args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch
+        overrode_max_train_steps = True
+    lr_scheduler = get_scheduler(
+        args.lr_scheduler,
+        optimizer=optimizer,
+        num_warmup_steps=args.lr_warmup_steps * args.gradient_accumulation_steps,
+        num_training_steps=args.max_train_steps * args.gradient_accumulation_steps,
+    )
+    if args.train_text_encoder:
+        (
+            unet,
+            text_encoder,
+            optimizer,
+            train_dataloader,
+            lr_scheduler,
+        ) = accelerator.prepare(
+            unet, text_encoder, optimizer, train_dataloader, lr_scheduler
+        )
+    else:
+        unet, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
+            unet, optimizer, train_dataloader, lr_scheduler
+        )
+    weight_dtype = torch.float32
+    if accelerator.mixed_precision == "fp16":
+        weight_dtype = torch.float16
+    elif accelerator.mixed_precision == "bf16":
+        weight_dtype = torch.bfloat16
+    # Move text_encode and vae to gpu.
+    # For mixed precision training we cast the text_encoder and vae weights to half-precision
+    # as these models are only used for inference, keeping weights in full precision is not required.
+    vae.to(accelerator.device, dtype=weight_dtype)
+    if not args.train_text_encoder:
+        text_encoder.to(accelerator.device, dtype=weight_dtype)
+    # We need to recalculate our total training steps as the size of the training dataloader may have changed.
+    num_update_steps_per_epoch = math.ceil(
+        len(train_dataloader) / args.gradient_accumulation_steps
+    )
+    if overrode_max_train_steps:
+        args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch
+    # Afterwards we recalculate our number of training epochs
+    args.num_train_epochs = math.ceil(args.max_train_steps / num_update_steps_per_epoch)
+    # We need to initialize the trackers we use, and also store our configuration.
+    # The trackers initializes automatically on the main process.
+    if accelerator.is_main_process:
+        accelerator.init_trackers("dreambooth", config=vars(args))
+    # Train!
+    total_batch_size = (
+        args.train_batch_size
+        * accelerator.num_processes
+        * args.gradient_accumulation_steps
+    )
+    print("***** Running training *****")
+    print(f"  Num examples = {len(train_dataset)}")
+    print(f"  Num batches each epoch = {len(train_dataloader)}")
+    print(f"  Num Epochs = {args.num_train_epochs}")
+    print(f"  Instantaneous batch size per device = {args.train_batch_size}")
+    print(
+        f"  Total train batch size (w. parallel, distributed & accumulation) = {total_batch_size}"
+    )
+    print(f"  Gradient Accumulation steps = {args.gradient_accumulation_steps}")
+    print(f"  Total optimization steps = {args.max_train_steps}")
+    # Only show the progress bar once on each machine.
+    progress_bar = tqdm(
+        range(args.max_train_steps), disable=not accelerator.is_local_main_process
+    )
+    progress_bar.set_description("Steps")
+    global_step = 0
+    for epoch in range(args.num_train_epochs):
+        unet.train()
+        if args.train_text_encoder:
+            text_encoder.train()
+        for step, batch in enumerate(train_dataloader):
+            # Convert images to latent space
+            latents = vae.encode(
+                batch["pixel_values"].to(dtype=weight_dtype)
+            ).latent_dist.sample()
+            latents = latents * 0.18215
+            # Sample noise that we'll add to the latents
+            noise = torch.randn_like(latents)
+            bsz = latents.shape[0]
+            # Sample a random timestep for each image
+            timesteps = torch.randint(
+                0,
+                noise_scheduler.config.num_train_timesteps,
+                (bsz,),
+                device=latents.device,
+            )
+            timesteps = timesteps.long()
+            # Add noise to the latents according to the noise magnitude at each timestep
+            # (this is the forward diffusion process)
+            noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)
+            # Get the text embedding for conditioning
+            encoder_hidden_states = text_encoder(batch["input_ids"])[0]
+            # Predict the noise residual
+            model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
+            # Get the target for loss depending on the prediction type
+            if noise_scheduler.config.prediction_type == "epsilon":
+                target = noise
+            elif noise_scheduler.config.prediction_type == "v_prediction":
+                target = noise_scheduler.get_velocity(latents, noise, timesteps)
+            else:
+                raise ValueError(
+                    f"Unknown prediction type {noise_scheduler.config.prediction_type}"
+                )
+            if args.with_prior_preservation:
+                # Chunk the noise and model_pred into two parts and compute the loss on each part separately.
+                model_pred, model_pred_prior = torch.chunk(model_pred, 2, dim=0)
+                target, target_prior = torch.chunk(target, 2, dim=0)
+                # Compute instance loss
+                loss = (
+                    F.mse_loss(model_pred.float(), target.float(), reduction="none")
+                    .mean([1, 2, 3])
+                    .mean()
+                )
+                # Compute prior loss
+                prior_loss = F.mse_loss(
+                    model_pred_prior.float(), target_prior.float(), reduction="mean"
+                )
+                # Add the prior loss to the instance loss.
+                loss = loss + args.prior_loss_weight * prior_loss
+            else:
+                loss = F.mse_loss(model_pred.float(), target.float(), reduction="mean")
+            accelerator.backward(loss)
+            if accelerator.sync_gradients:
+                params_to_clip = (
+                    itertools.chain(unet.parameters(), text_encoder.parameters())
+                    if args.train_text_encoder
+                    else unet.parameters()
+                )
+                accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)
+            optimizer.step()
+            lr_scheduler.step()
+            progress_bar.update(1)
+            optimizer.zero_grad()
+        # Checks if the accelerator has performed an optimization step behind the scenes
+        if accelerator.sync_gradients:
+            global_step += 1
+            if global_step % args.save_steps == 0:
+                if accelerator.is_main_process:
+                    pipeline = StableDiffusionPipeline.from_pretrained(
+                        args.pretrained_model_name_or_path,
+                        unet=accelerator.unwrap_model(unet),
+                        text_encoder=accelerator.unwrap_model(text_encoder),
+                        revision=args.revision,
+                    )
+                    save_lora_weight(pipeline.unet, args.output_dir + "/lora_weight.pt")
+        logs = {"loss": loss.detach().item(), "lr": lr_scheduler.get_last_lr()[0]}
+        progress_bar.set_postfix(**logs)
+        accelerator.log(logs, step=global_step)
+        if global_step >= args.max_train_steps:
+            break
+        accelerator.wait_for_everyone()
+    # Create the pipeline using using the trained modules and save it.
+    if accelerator.is_main_process:
+        pipeline = StableDiffusionPipeline.from_pretrained(
+            args.pretrained_model_name_or_path,
+            unet=accelerator.unwrap_model(unet),
+            text_encoder=accelerator.unwrap_model(text_encoder),
+            revision=args.revision,
+        )
+        print("\n\nLora TRAINING DONE!\n\n")
+        save_lora_weight(pipeline.unet, args.output_dir + "/lora_weight.pt")
+        for _up, _down in extract_lora_ups_down(pipeline.unet):
+            print("First Layer's Up Weight is now : ", _up.weight)
+            print("First Layer's Down Weight is now : ", _down.weight)
+            break
+        if args.push_to_hub:
+            repo.push_to_hub(
+                commit_message="End of training", blocking=False, auto_lfs_prune=True
+            )
+    accelerator.end_training()
+if __name__ == "__main__":
+    args = parse_args()
+    main(args)