SDXL latent to image

It takes the 4ch latent and decodes it with the WanDecoder3d module.

After a short warmup phase, the head of the WanDecoder3d became part of the process.

During the warmup, the model learned the color space. Later on, the imported/modified head improved the stability of the image.

if __name__ == '__main__':
    model = WanXL()
    vae = AutoencoderKLWan.from_pretrained('Wan-AI/Wan2.2-TI2V-5B-Diffusers', subfolder='vae')
    z = torch.randn(1, 4, 128, 128)  # (B, C, H, W)
    x = model(z)  # (B, C, T, H, W)
    image = transforms.functional.to_pil_image(model.decode_by(vae, x).squeeze())

The SDXL latent was generated by this model.

As shown in the example, the target image size is preferably 1024px due to the lossy compression of the original encoded data.

Datasets

  • 12TPICS
  • jlbaker361/flickr_humans
Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
135M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sugarquark/Kutches-Anomaly-Detection-and-Response

Finetuned
(1)
this model