SoteDiffusion Wuerstchen3
Collection
Anime Finetune of Würstchen V3
•
15 items
•
Updated
•
1
Anime finetune of Stable Cascade.
Currently is in very early state in training.
No commercial use thanks to StabilityAI.
pip install diffusers
import torch
from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline
prompt = "(extremely aesthetic, best quality, newest), 1girl, solo, cat ears, looking at viewer, blush, light smile, upper body,"
negative_prompt = "very displeasing, worst quality, monochrome, sketch, blurry, fat, child,"
prior = StableCascadePriorPipeline.from_pretrained("Disty0/sote-diffusion-cascade_pre-alpha0", torch_dtype=torch.float16)
decoder = StableCascadeDecoderPipeline.from_pretrained("Disty0/sote-diffusion-cascade-decoder_pre-alpha0", torch_dtype=torch.float16)
prior.enable_model_cpu_offload()
prior_output = prior(
prompt=prompt,
height=1024,
width=1024,
negative_prompt=negative_prompt,
guidance_scale=6.0,
num_images_per_prompt=1,
num_inference_steps=40
)
decoder.enable_model_cpu_offload()
decoder_output = decoder(
image_embeddings=prior_output.image_embeddings,
prompt=prompt,
negative_prompt=negative_prompt,
guidance_scale=2.0,
output_type="pil",
num_inference_steps=10
).images[0]
decoder_output.save("cascade.png")
GPU used for training: 1x AMD RX 7900 XTX 24GB
dataset name | training done | remaining |
---|---|---|
newest | 002 | 218 |
late | 002 | 204 |
mid | 002 | 199 |
early | 002 | 053 |
oldest | 002 | 014 |
pixiv | 002 | 072 |
visual novel cg | 002 | 068 |
anime wallpaper | 002 | 011 |
Total | 24 | 839 |
Note: chunks starts from 0 and there are 8000 images per chunk
GPU used for captioning: 1x Intel ARC A770 16GB
Model used for captioning: SmilingWolf/wd-v1-4-convnextv2-tagger-v2
dataset name | total images | total chunk |
---|---|---|
newest | 1.766.335 | 221 |
late | 1.652.420 | 207 |
mid | 1.609.608 | 202 |
early | 442.368 | 056 |
oldest | 128.311 | 017 |
pixiv | 594.046 | 075 |
visual novel cg | 560.903 | 071 |
anime wallpaper | 106.882 | 014 |
Total | 6.860.873 | 863 |
Note: Smallest size is 1280x600 | 768.000 pixels
aesthetic tags, quality tags, date tags, custom tags, rest of the tags
tag | date |
---|---|
newest | 2022 to 2024 |
late | 2019 to 2021 |
mid | 2015 to 2018 |
early | 2011 to 2014 |
oldest | 2005 to 2010 |
Model used: shadowlilac/aesthetic-shadow
score greater than | tag |
---|---|
0.980 | extremely aesthetic |
0.900 | very aesthetic |
0.750 | aesthetic |
0.500 | slightly aesthetic |
0.350 | not displeasing |
0.250 | not aesthetic |
0.125 | slightly displeasing |
0.025 | displeasing |
rest of them | very displeasing |
Model used: https://huggingface.co/hakurei/waifu-diffusion-v1-4/blob/main/models/aes-B32-v0.pth
score greater than | tag |
---|---|
0.980 | best quality |
0.900 | high quality |
0.750 | great quality |
0.500 | medium quality |
0.250 | normal quality |
0.125 | bad quality |
0.025 | low quality |
rest of them | worst quality |
dataset name | custom tag |
---|---|
image boards | date, |
pixiv | art by Display_Name, |
visual novel cg | Full_VN_Name (short_3_letter_name), visual novel cg, |
anime wallpaper | date, anime wallpaper, |
Software used: Kohya SD-Scripts with Stable Cascade branch
Base model: KBlueLeaf/Stable-Cascade-FP16-fixed
accelerate launch --mixed_precision fp16 --num_cpu_threads_per_process 1 stable_cascade_train_stage_c.py \
--mixed_precision fp16 \
--save_precision fp16 \
--full_fp16 \
--sdpa \
--gradient_checkpointing \
--resolution "1024,1024" \
--train_batch_size 2 \
--gradient_accumulation_steps 32 \
--adaptive_loss_weight \
--learning_rate 4e-6 \
--lr_scheduler constant_with_warmup \
--lr_warmup_steps 100 \
--optimizer_type adafactor \
--optimizer_args "scale_parameter=False" "relative_step=False" "warmup_init=False" \
--max_grad_norm 0 \
--token_warmup_min 1 \
--token_warmup_step 0 \
--shuffle_caption \
--caption_dropout_rate 0 \
--caption_tag_dropout_rate 0 \
--caption_dropout_every_n_epochs 0 \
--dataset_repeats 1 \
--save_state \
--save_every_n_steps 128 \
--sample_every_n_steps 32 \
--max_token_length 225 \
--max_train_epochs 1 \
--caption_extension ".txt" \
--max_data_loader_n_workers 2 \
--persistent_data_loader_workers \
--enable_bucket \
--min_bucket_reso 256 \
--max_bucket_reso 4096 \
--bucket_reso_steps 64 \
--bucket_no_upscale \
--log_with tensorboard \
--output_name sotediffusion-sc_3b \
--train_data_dir /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0002 \
--in_json /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0002.json \
--output_dir /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-2 \
--logging_dir /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-2/logs \
--resume /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-1/sotediffusion-sc_3b-1-state \
--stage_c_checkpoint_path /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-1/sotediffusion-sc_3b-1.safetensors \
--effnet_checkpoint_path /mnt/DataSSD/AI/models/sd-cascade/effnet_encoder.safetensors \
--previewer_checkpoint_path /mnt/DataSSD/AI/models/sd-cascade/previewer.safetensors \
--sample_prompts /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-prompt.txt