metadata
base_model: CompVis/stable-diffusion-v1-4
library_name: diffusers
license: creativeml-openrail-m
inference: true
tags:
- stable-diffusion
- stable-diffusion-diffusers
- text-to-image
- diffusers
- diffusers-training
- lora
Akirashindo39/kanji-diffusion-v1-4-kanjidic2
This model is a text-to-image diffusion model capable of hallucinating Kanji characters given any English prompt.
Fine-tuned Model Details
- Developed by: Akira Shindo
- Model type: Diffusion-based text-to-image generation model, fine-tuned on Stable Diffusion v1.4 model using the Akirashindo39/KANJIDIC2 dataset.
How to use
Use Google Colab to run the following script. It is recommended to use a GPU (such as a T4 GPU) to run the script, or else it will take a long time to process. Make sure you have your Huggingface API KEY / ACCESS TOKEN for this.
!pip install diffusers
!git clone https://github.com/huggingface/diffusers
!huggingface-cli login
import os
from google.colab import drive
# Mount Google Drive to access persistent storage across Colab sessions
drive.mount('/content/drive')
# Navigate to the project directory in Google Drive
os.chdir("/content/drive/MyDrive")
from diffusers import StableDiffusionPipeline
import torch
torch.cuda.empty_cache()
model_path = "Akirashindo39/kanji-diffusion-v1-4-kanjidic2"
pipe = StableDiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
torch_dtype=torch.float16,
use_safetensors=True
).to("cuda")
pipe.unet.load_attn_procs(model_path)
pipe.to("cuda")
new_kanji_meaning = "internet" # Enter new kanji meaning here
prompt = f"a Kanji meaning {new_kanji_meaning}"
image = pipe(prompt).images[0]
image.save(f"{new_kanji_meaning}-kanji-v1-4.png")
Training details
Hardware Used: 8GB RAM and T4 GPU on Colab
The training script below was executed, completing in approximately two hours.
# Launch LoRA fine-tuning for text-to-image model with accelerate
!accelerate launch train_text_to_image_lora.py \
--pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" \
--dataset_name="Akirashindo39/KANJIDIC2" \
--image_column="image" \
--caption_column="text" \
--resolution=512 \
--random_flip \
--train_batch_size=1 \
--num_train_epochs=1 \
--checkpointing_steps=2000 \
--learning_rate=1e-04 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--seed=42 \
--output_dir="Akirashindo39/kanji-diffusion-v1-4-kanjidic2" \
--validation_prompt="A kanji meaning Elon Musk" \
--push_to_hub