Check out my blog!

You can use 6 numbers to fully describe the style of an (anime) image!

What's it and what could it do?

Many diffusion models, though, choose to use artist tags to control the style of output images. I am really not a fan of that, for three reasons:

  1. Many artists share very similar styles, making many artist tags redundant.
  2. Some artists have more than one distinct art style in their works. For basic example, sketch vs finished images.
  3. Prone to content bleeding. If the artist tag you choose draws lots of repeating content, it's very likely these content will bleed into your output despite not prompting for them.

One way to overcome this is using a style embedding model. It's a model which takes in images of arbitrary sizes and outputs a style vector for each image. The style vector lives in an N-Dimension space, and is essentially just a list of numbers with a length of N. Each number in the list corresponds to a specific style element the input image has.

Images with similar style should have similar embedding (low distance) while different style will have embeddings that are far apart (high distance).

The included py file gives minimal usage example. minimal_script.py provides the minimal codes for running an image through the network and obtain an output. While gallery_review.py contains the code I used to generate those visualisations and clustering.

Training data is here.

Training Hyperparameters

With current version (v3):

Training was done using PyTorch Lightning.

lr = 0.0001

weight_decay = 0.0001

AdEMAMix optimizer

ExponentialLR scheduler, with a gamma of 0.99, applied every epoch.

Batch size of 1. accumulate_grad_batches of 16.

With every anchor image, 16 positive images and 16 negative images are used.

Trained for 15 epoches. On 2 A100 GPUs. A total of 3434 optimizer updates.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Fgdfgfthgr/Anime_Images_Style_Embedder