|
--- |
|
license: mit |
|
--- |
|
|
|
# VAE for high-resolution image generation with stable diffusion |
|
|
|
This VAE is trained by adding only one step of noise to the latent and denoising the latent with U-net, to avoid oversensitivity to latent. |
|
This process reduces the possibility to describe too much detail in some objects, such as plants and eyes, etc., than in the surroundings when generated at high resolution. |
|
The dataset consists of 19k images tagged nijijourneyv5 and published on the web, and was denoised using [the same dataset trained models](https://huggingface.co/Ai-tensa/FlexWaifu). |
|
|
|
## sample |
|
|
|
![](xyz_grid-0011-1798392412.png) |
|
|
|
## training details |
|
|
|
- base model: [VAE developed by CompVis](https://github.com/CompVis/latent-diffusion) |
|
- 19k 1images |
|
- 2 epochs |
|
- Aspect Ratio Bucketing based on 768p resolution |
|
- multires noise |
|
- lr: 1e-5 |
|
- precision: fp32 |