metadata

base_model: CompVis/stable-diffusion-v1-4
library_name: diffusers
license: creativeml-openrail-m
inference: true
tags:
  - stable-diffusion
  - stable-diffusion-diffusers
  - text-to-image
  - diffusers
  - diffusers-training
  - lora

Akirashindo39/kanji-diffusion-v1-4-kanjidic2

This model is a text-to-image diffusion model capable of hallucinating Kanji characters given any English prompt.

Fine-tuned Model Details

Developed by: Akira Shindo
Model type: Diffusion-based text-to-image generation model, fine-tuned on Stable Diffusion v1.4 model using the Akirashindo39/KANJIDIC2 dataset.

How to use

Use Google Colab to run the following script. It is recommended to use a GPU (such as a T4 GPU) to run the script, or else it will take a long time to process. Make sure you have your Huggingface API KEY / ACCESS TOKEN for this.

!pip install diffusers
!git clone https://github.com/huggingface/diffusers
!huggingface-cli login

import os
from google.colab import drive

# Mount Google Drive to access persistent storage across Colab sessions
drive.mount('/content/drive')

# Navigate to the project directory in Google Drive
os.chdir("/content/drive/MyDrive")

from diffusers import StableDiffusionPipeline
import torch

torch.cuda.empty_cache()

model_path = "Akirashindo39/kanji-diffusion-v1-4-kanjidic2"

pipe = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    torch_dtype=torch.float16,
    use_safetensors=True
).to("cuda")
pipe.unet.load_attn_procs(model_path)

pipe.to("cuda")

new_kanji_meaning = "internet" # Enter new kanji meaning here
prompt = f"a Kanji meaning {new_kanji_meaning}"
image = pipe(prompt).images[0]
image.save(f"{new_kanji_meaning}-kanji-v1-4.png")

Training details

Hardware Used: 8GB RAM and T4 GPU on Colab

The training script below was executed, completing in approximately two hours.

# Launch LoRA fine-tuning for text-to-image model with accelerate
!accelerate launch train_text_to_image_lora.py \
  --pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" \
  --dataset_name="Akirashindo39/KANJIDIC2" \
  --image_column="image" \
  --caption_column="text" \
  --resolution=512 \
  --random_flip \
  --train_batch_size=1 \
  --num_train_epochs=1 \
  --checkpointing_steps=2000 \
  --learning_rate=1e-04 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --seed=42 \
  --output_dir="Akirashindo39/kanji-diffusion-v1-4-kanjidic2" \
  --validation_prompt="A kanji meaning Elon Musk" \
  --push_to_hub