Akirashindo39 commited on
Commit
f738322
·
verified ·
1 Parent(s): 24b312e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -14
README.md CHANGED
@@ -16,28 +16,75 @@ tags:
16
  should probably proofread and complete it, then remove this comment. -->
17
 
18
 
19
- # LoRA text2image fine-tuning - Akirashindo39/kanji-diffusion-v1-4-kanjidic2
20
- These are LoRA adaption weights for CompVis/stable-diffusion-v1-4. The weights were fine-tuned on the Akirashindo39/KANJIDIC2 dataset. You can find some example images in the following.
21
 
22
- ![img_0](./image_0.png)
23
- ![img_1](./image_1.png)
24
- ![img_2](./image_2.png)
25
- ![img_3](./image_3.png)
26
 
 
 
27
 
 
 
 
 
28
 
29
- ## Intended uses & limitations
 
30
 
31
- #### How to use
 
32
 
33
- ```python
34
- # TODO: add an example code snippet for running this diffusion pipeline
35
- ```
 
 
 
 
36
 
37
- #### Limitations and bias
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
- [TODO: provide examples of latent issues and potential remediations]
40
 
41
  ## Training details
42
 
43
- [TODO: describe the data used to train the model]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  should probably proofread and complete it, then remove this comment. -->
17
 
18
 
19
+ # Akirashindo39/kanji-diffusion-v1-4-kanjidic2
20
+ This model is a text-to-image diffusion model capable of hallucinating Kanji characters given any English prompt.
21
 
22
+ #### Fine-tuned Model Details
23
+ - Developed by: Akira Shindo
24
+ - Model type: Diffusion-based text-to-image generation model, fine-tuned on [Stable Diffusion v1.4](https://github.com/CompVis/stable-diffusion) model using the [Akirashindo39/KANJIDIC2](https://huggingface.co/datasets/Akirashindo39/KANJIDIC2) dataset.
 
25
 
26
+ #### How to use
27
+ Use [Google Colab](https://colab.research.google.com/) to run the following script. It is recommended to use a GPU (such as a T4 GPU) to run the script, or else it will take a long time to process. Make sure you have your Huggingface API KEY / ACCESS TOKEN for this.
28
 
29
+ ```python
30
+ !pip install diffusers
31
+ !git clone https://github.com/huggingface/diffusers
32
+ !huggingface-cli login
33
 
34
+ import os
35
+ from google.colab import drive
36
 
37
+ # Mount Google Drive to access persistent storage across Colab sessions
38
+ drive.mount('/content/drive')
39
 
40
+ # Navigate to the project directory in Google Drive
41
+ os.chdir("/content/drive/MyDrive")
42
+
43
+ from diffusers import StableDiffusionPipeline
44
+ import torch
45
+
46
+ torch.cuda.empty_cache()
47
 
48
+ model_path = "Akirashindo39/kanji-diffusion-v1-4-kanjidic2"
49
+
50
+ pipe = StableDiffusionPipeline.from_pretrained(
51
+ "CompVis/stable-diffusion-v1-4",
52
+ torch_dtype=torch.float16,
53
+ use_safetensors=True
54
+ ).to("cuda")
55
+ pipe.unet.load_attn_procs(model_path)
56
+
57
+ pipe.to("cuda")
58
+
59
+ new_kanji_meaning = "internet" # Enter new kanji meaning here
60
+ prompt = f"a Kanji meaning {new_kanji_meaning}"
61
+ image = pipe(prompt).images[0]
62
+ image.save(f"{new_kanji_meaning}-kanji-v1-4.png")
63
+ ```
64
 
 
65
 
66
  ## Training details
67
 
68
+ #### Hardware Used: 8GB RAM and T4 GPU on Colab
69
+ The training script below was executed, completing in approximately two hours.
70
+
71
+ ```python
72
+ # Launch LoRA fine-tuning for text-to-image model with accelerate
73
+ !accelerate launch train_text_to_image_lora.py \
74
+ --pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" \
75
+ --dataset_name="Akirashindo39/KANJIDIC2" \
76
+ --image_column="image" \
77
+ --caption_column="text" \
78
+ --resolution=512 \
79
+ --random_flip \
80
+ --train_batch_size=1 \
81
+ --num_train_epochs=1 \
82
+ --checkpointing_steps=2000 \
83
+ --learning_rate=1e-04 \
84
+ --lr_scheduler="constant" \
85
+ --lr_warmup_steps=0 \
86
+ --seed=42 \
87
+ --output_dir="Akirashindo39/kanji-diffusion-v1-4-kanjidic2" \
88
+ --validation_prompt="A kanji meaning Elon Musk" \
89
+ --push_to_hub
90
+ ```