Update README.md
Browse files
README.md
CHANGED
@@ -16,28 +16,75 @@ tags:
|
|
16 |
should probably proofread and complete it, then remove this comment. -->
|
17 |
|
18 |
|
19 |
-
#
|
20 |
-
|
21 |
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-

|
26 |
|
|
|
|
|
27 |
|
|
|
|
|
|
|
|
|
28 |
|
29 |
-
|
|
|
30 |
|
31 |
-
|
|
|
32 |
|
33 |
-
|
34 |
-
|
35 |
-
|
|
|
|
|
|
|
|
|
36 |
|
37 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
|
39 |
-
[TODO: provide examples of latent issues and potential remediations]
|
40 |
|
41 |
## Training details
|
42 |
|
43 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
should probably proofread and complete it, then remove this comment. -->
|
17 |
|
18 |
|
19 |
+
# Akirashindo39/kanji-diffusion-v1-4-kanjidic2
|
20 |
+
This model is a text-to-image diffusion model capable of hallucinating Kanji characters given any English prompt.
|
21 |
|
22 |
+
#### Fine-tuned Model Details
|
23 |
+
- Developed by: Akira Shindo
|
24 |
+
- Model type: Diffusion-based text-to-image generation model, fine-tuned on [Stable Diffusion v1.4](https://github.com/CompVis/stable-diffusion) model using the [Akirashindo39/KANJIDIC2](https://huggingface.co/datasets/Akirashindo39/KANJIDIC2) dataset.
|
|
|
25 |
|
26 |
+
#### How to use
|
27 |
+
Use [Google Colab](https://colab.research.google.com/) to run the following script. It is recommended to use a GPU (such as a T4 GPU) to run the script, or else it will take a long time to process. Make sure you have your Huggingface API KEY / ACCESS TOKEN for this.
|
28 |
|
29 |
+
```python
|
30 |
+
!pip install diffusers
|
31 |
+
!git clone https://github.com/huggingface/diffusers
|
32 |
+
!huggingface-cli login
|
33 |
|
34 |
+
import os
|
35 |
+
from google.colab import drive
|
36 |
|
37 |
+
# Mount Google Drive to access persistent storage across Colab sessions
|
38 |
+
drive.mount('/content/drive')
|
39 |
|
40 |
+
# Navigate to the project directory in Google Drive
|
41 |
+
os.chdir("/content/drive/MyDrive")
|
42 |
+
|
43 |
+
from diffusers import StableDiffusionPipeline
|
44 |
+
import torch
|
45 |
+
|
46 |
+
torch.cuda.empty_cache()
|
47 |
|
48 |
+
model_path = "Akirashindo39/kanji-diffusion-v1-4-kanjidic2"
|
49 |
+
|
50 |
+
pipe = StableDiffusionPipeline.from_pretrained(
|
51 |
+
"CompVis/stable-diffusion-v1-4",
|
52 |
+
torch_dtype=torch.float16,
|
53 |
+
use_safetensors=True
|
54 |
+
).to("cuda")
|
55 |
+
pipe.unet.load_attn_procs(model_path)
|
56 |
+
|
57 |
+
pipe.to("cuda")
|
58 |
+
|
59 |
+
new_kanji_meaning = "internet" # Enter new kanji meaning here
|
60 |
+
prompt = f"a Kanji meaning {new_kanji_meaning}"
|
61 |
+
image = pipe(prompt).images[0]
|
62 |
+
image.save(f"{new_kanji_meaning}-kanji-v1-4.png")
|
63 |
+
```
|
64 |
|
|
|
65 |
|
66 |
## Training details
|
67 |
|
68 |
+
#### Hardware Used: 8GB RAM and T4 GPU on Colab
|
69 |
+
The training script below was executed, completing in approximately two hours.
|
70 |
+
|
71 |
+
```python
|
72 |
+
# Launch LoRA fine-tuning for text-to-image model with accelerate
|
73 |
+
!accelerate launch train_text_to_image_lora.py \
|
74 |
+
--pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" \
|
75 |
+
--dataset_name="Akirashindo39/KANJIDIC2" \
|
76 |
+
--image_column="image" \
|
77 |
+
--caption_column="text" \
|
78 |
+
--resolution=512 \
|
79 |
+
--random_flip \
|
80 |
+
--train_batch_size=1 \
|
81 |
+
--num_train_epochs=1 \
|
82 |
+
--checkpointing_steps=2000 \
|
83 |
+
--learning_rate=1e-04 \
|
84 |
+
--lr_scheduler="constant" \
|
85 |
+
--lr_warmup_steps=0 \
|
86 |
+
--seed=42 \
|
87 |
+
--output_dir="Akirashindo39/kanji-diffusion-v1-4-kanjidic2" \
|
88 |
+
--validation_prompt="A kanji meaning Elon Musk" \
|
89 |
+
--push_to_hub
|
90 |
+
```
|