Hatsu Preview2: A Bad Anime Model from Scratch in 1 GPU Day

Model Summary

Hatsu Preview2 is a 120M + 67M parameter image generation model. It is created with the main objective of setting a simple baseline for anime models. The model consists of a 120M parameter image generator and a 67M parameter condition generator.

The image generator operates in a 12x compressed latent space and creates 224x320 RGB images, while the condition generator creates 1024D condition vectors given a set of tags and a style embedding. For ease of use, the condition generator accepts arbitrary subsets of the tags instead of requiring the user to specify every tag. As in a previous exploration, the 1024D vectors are encodings created by a pretrained tagger. The condition supports a total of ~12k rating/general/character tags, although not all of them are well-understood.

The latents are produced by a lightweight autoencoder trained from scratch with 8x spatial compression and 16 latent channels.

The training dataset is a 1.1M-image 1girl solo subset of the Danbooru2023 dataset, downscaled to 224x320. The image generator is trained for 25M images, and the condition generator is trained for 100M images. The combined training can be completed in ~1 day on a 4090 GPU.

The training actually extended for a bit longer than 25M but no noticeble quality improvement was observed, presumably due to insufficient model capacity and/or inadequate loss weighting.

Qualitative Results

The generations are quite inconsistent. Or rather, consistently bad. As such, a lot of cherrypicking here. No grids, less samples.

The condition generator actually requires a style reference as the input. Since a style generator hasn't been trained, all generations here use hand-picked style reference images. The style reference barely works anyways.

general, dress (style reference)

smile, blush, sensitive (style reference)

sensitive, large breasts, cat ears, long hair, purple eyes, swimsuit (style reference)

sensitive, black leotard, sideboob, blue eyes, blue hair (style reference)

general, white pantyhose, black hair, medium hair, maid, detached collar (style reference)

general, bench, black hair, black socks, blush, book, brown eyes, closed mouth, from above, holding, holding book, kneehighs, legs, looking at viewer, on bench, open book, pink shirt, shirt, short hair, short sleeves, sitting, skirt, smile, socks, t-shirt, thighs, white skirt (style reference)

general, bare shoulders, blue background, blue eyes, blue hair, breasts, choker, cleavage, dress, facing viewer, falling petals, hair behind ear, hair between eyes, lips, looking to the side, medium hair, off-shoulder dress, off shoulder, parted bangs, parted lips, petals, ribbon, ribbon choker, simple background, skirt hold, wind, yellow choker, yellow dress, yellow ribbon (style reference)

bare shoulders, between breasts, blunt bangs, breasts, brown hair, bug, butterfly, cleavage, closed mouth, dot nose, dress, field, flower, green eyes, hair flower, hair ornament, hairpin, jewelry, large breasts, long hair, looking at viewer, lying, necklace, sleeveless, sleeveless dress, sundress, thighs, very long hair, white butterfly, white dress, white flower, sensitive (style reference)

blue dress, blush, breasts, bug, bun cover, butterfly, butterfly on hand, china dress, chinese clothes, cleavage, cleavage cutout, clothing cutout, collared dress, contrapposto, cowboy shot, double bun, dress, frilled dress, frilled sleeves, frills, gold trim, hair bun, hand up, head tilt, hip vent, lace trim, long hair, looking at viewer, medium breasts, parted lips, purple eyes, single sidelock, standing, thigh strap, thighs, twintails, very long hair, white hair, sensitive (style reference)