Hatsu Preview2: A Bad Anime Model from Scratch in 1 GPU Day

See also: Hatsu Preview1

More experimental and less successful than the previous one. Still uploading for personal future reference. The code and a more comprehensive report will be provided upon proper release.

Model Summary

Hatsu Preview2 is a 120M + 67M parameter image generation model. It is created with the main objective of setting a simple baseline for anime models. The model consists of a 120M parameter image generator and a 67M parameter condition generator.

The image generator operates in a 12x compressed latent space and creates 224x320 RGB images, while the condition generator creates 1024D condition vectors given a set of tags and a style embedding. For ease of use, the condition generator accepts arbitrary subsets of the tags instead of requiring the user to specify every tag. As in a previous exploration, the 1024D vectors are encodings created by a pretrained tagger. The condition supports a total of ~12k rating/general/character tags, although not all of them are well-understood.

The latents are produced by a lightweight autoencoder trained from scratch with 8x spatial compression and 16 latent channels.

The training dataset is a 1.1M-image 1girl solo subset of the Danbooru2023 dataset, downscaled to 224x320. The image generator is trained for 25M images, and the condition generator is trained for 100M images. The combined training can be completed in ~1 day on a 4090 GPU.

The training actually extended for a bit longer than 25M but no noticeble quality improvement was observed, presumably due to insufficient model capacity and/or inadequate loss weighting.

Qualitative Results

The generations are quite inconsistent. Or rather, consistently bad. As such, a lot of cherrypicking here. No grids, less samples.

The condition generator actually requires a style reference as the input. Since a style generator hasn't been trained, all generations here use hand-picked style reference images. The style reference barely works anyways.


general, dress (style reference) image/png image/png


smile, blush, sensitive (style reference)

image/png image/png image/png


sensitive, large breasts, cat ears, long hair, purple eyes, swimsuit (style reference)

image/png image/png image/png


sensitive, black leotard, sideboob, blue eyes, blue hair (style reference)

image/png image/png image/png


general, white pantyhose, black hair, medium hair, maid, detached collar (style reference)

image/png image/png image/png


general, bench, black hair, black socks, blush, book, brown eyes, closed mouth, from above, holding, holding book, kneehighs, legs, looking at viewer, on bench, open book, pink shirt, shirt, short hair, short sleeves, sitting, skirt, smile, socks, t-shirt, thighs, white skirt (style reference)

image/png image/png image/png


general, bare shoulders, blue background, blue eyes, blue hair, breasts, choker, cleavage, dress, facing viewer, falling petals, hair behind ear, hair between eyes, lips, looking to the side, medium hair, off-shoulder dress, off shoulder, parted bangs, parted lips, petals, ribbon, ribbon choker, simple background, skirt hold, wind, yellow choker, yellow dress, yellow ribbon (style reference) image/png image/png image/png


bare shoulders, between breasts, blunt bangs, breasts, brown hair, bug, butterfly, cleavage, closed mouth, dot nose, dress, field, flower, green eyes, hair flower, hair ornament, hairpin, jewelry, large breasts, long hair, looking at viewer, lying, necklace, sleeveless, sleeveless dress, sundress, thighs, very long hair, white butterfly, white dress, white flower, sensitive (style reference)

image/png image/png image/png


blue dress, blush, breasts, bug, bun cover, butterfly, butterfly on hand, china dress, chinese clothes, cleavage, cleavage cutout, clothing cutout, collared dress, contrapposto, cowboy shot, double bun, dress, frilled dress, frilled sleeves, frills, gold trim, hair bun, hand up, head tilt, hip vent, lace trim, long hair, looking at viewer, medium breasts, parted lips, purple eyes, single sidelock, standing, thigh strap, thighs, twintails, very long hair, white hair, sensitive (style reference)

image/png image/png image/png

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support