--- license: apache-2.0 base_model: - Wan-AI/Wan2.2-TI2V-5B-Diffusers --- # SDXL latent to image It takes the 4ch latent and decodes it with the [WanDecoder3d module](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B-Diffusers/tree/main/vae). After a short warmup phase, the head of the WanDecoder3d became part of the process. During the warmup, the model learned the color space. Later on, the imported/modified head improved the stability of the image. ```python if __name__ == '__main__': model = WanXL() vae = AutoencoderKLWan.from_pretrained('Wan-AI/Wan2.2-TI2V-5B-Diffusers', subfolder='vae') z = torch.randn(1, 4, 128, 128) # (B, C, H, W) x = model(z) # (B, C, T, H, W) image = transforms.functional.to_pil_image(model.decode_by(vae, x).squeeze()) ``` The SDXL latent was generated by this [model](https://huggingface.co/Laxhar/noobai-XL-Vpred-1.0/tree/main/vae). As shown in the example, the target image size is preferably 1024px due to the lossy compression of the original encoded data. ## Datasets - 12TPICS - jlbaker361/flickr_humans