Akirashindo39
/

kanji-diffusion-v1-4-kanjidic2

@@ -16,28 +16,75 @@ tags:
 should probably proofread and complete it, then remove this comment. -->
-# LoRA text2image fine-tuning - Akirashindo39/kanji-diffusion-v1-4-kanjidic2
-These are LoRA adaption weights for CompVis/stable-diffusion-v1-4. The weights were fine-tuned on the Akirashindo39/KANJIDIC2 dataset. You can find some example images in the following.
-![img_0](./image_0.png)
-![img_1](./image_1.png)
-![img_2](./image_2.png)
-![img_3](./image_3.png)
-## Intended uses & limitations
-#### How to use
-```python
-# TODO: add an example code snippet for running this diffusion pipeline
-```
-#### Limitations and bias
-[TODO: provide examples of latent issues and potential remediations]
 ## Training details
-[TODO: describe the data used to train the model]

 should probably proofread and complete it, then remove this comment. -->
+# Akirashindo39/kanji-diffusion-v1-4-kanjidic2
+This model is a text-to-image diffusion model capable of hallucinating Kanji characters given any English prompt.
+#### Fine-tuned Model Details
+- Developed by: Akira Shindo
+- Model type: Diffusion-based text-to-image generation model, fine-tuned on [Stable Diffusion v1.4](https://github.com/CompVis/stable-diffusion) model using the [Akirashindo39/KANJIDIC2](https://huggingface.co/datasets/Akirashindo39/KANJIDIC2) dataset.
+#### How to use
+Use [Google Colab](https://colab.research.google.com/) to run the following script. It is recommended to use a GPU (such as a T4 GPU) to run the script, or else it will take a long time to process. Make sure you have your Huggingface API KEY / ACCESS TOKEN for this.
+```python
+!pip install diffusers
+!git clone https://github.com/huggingface/diffusers
+!huggingface-cli login
+import os
+from google.colab import drive
+# Mount Google Drive to access persistent storage across Colab sessions
+drive.mount('/content/drive')
+# Navigate to the project directory in Google Drive
+os.chdir("/content/drive/MyDrive")
+from diffusers import StableDiffusionPipeline
+import torch
+torch.cuda.empty_cache()
+model_path = "Akirashindo39/kanji-diffusion-v1-4-kanjidic2"
+pipe = StableDiffusionPipeline.from_pretrained(
+    "CompVis/stable-diffusion-v1-4",
+    torch_dtype=torch.float16,
+    use_safetensors=True
+).to("cuda")
+pipe.unet.load_attn_procs(model_path)
+pipe.to("cuda")
+new_kanji_meaning = "internet" # Enter new kanji meaning here
+prompt = f"a Kanji meaning {new_kanji_meaning}"
+image = pipe(prompt).images[0]
+image.save(f"{new_kanji_meaning}-kanji-v1-4.png")
+```
 ## Training details
+#### Hardware Used: 8GB RAM and T4 GPU on Colab
+The training script below was executed, completing in approximately two hours.
+```python
+# Launch LoRA fine-tuning for text-to-image model with accelerate
+!accelerate launch train_text_to_image_lora.py \
+  --pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" \
+  --dataset_name="Akirashindo39/KANJIDIC2" \
+  --image_column="image" \
+  --caption_column="text" \
+  --resolution=512 \
+  --random_flip \
+  --train_batch_size=1 \
+  --num_train_epochs=1 \
+  --checkpointing_steps=2000 \
+  --learning_rate=1e-04 \
+  --lr_scheduler="constant" \
+  --lr_warmup_steps=0 \
+  --seed=42 \
+  --output_dir="Akirashindo39/kanji-diffusion-v1-4-kanjidic2" \
+  --validation_prompt="A kanji meaning Elon Musk" \
+  --push_to_hub
+```