Heasterian/AsymmetricAutoencoderKLUpscaler

It's simple upscaler using AsymmetricAutoencoderKL. I was playing around with code used for training in the middle of it a lot so it's nothing scientific. I was just pleased with results from something that easy to train.

For optimizers, training was done with AdEMAMix optimizer, dataset of ~4k images mostly including photos, digital art and small amount of PBR textures. I did some finetuning with same dataset, but Adopt optimizer with OrthoGrad from Grokking at the Edge of Numerical Stability (arXiv: 2501.04697). Model was trained at 96px x 96px resolution (so 192px x 192ox output).

For loss, I was using most of the time simple HSL loss (1 - cosine of difference between target and pred H and L1 loss for S and L channels), LPIPS+ and DISTS.

Model have issues with handling jpeg artifacts because I couldn't train it on random compression levels due to lack of support of ROCm by torchvision.transforms.v2.JPEG. In this case it's better to scale down image a bit before upscaling.

This is some proof of concept model. It can't be used commercially as is, but there is a chance that I'll train new version on some CC0 dataset with license permiting commercial usage and with better jpeg artifacts handling in future.

You can run model using code below

import torch

from torchvision import transforms, utils

import diffusers
from diffusers import AsymmetricAutoencoderKL

from diffusers.utils import load_image

def crop_image_to_nearest_divisible_by_8(img):
    # Check if the image height and width are divisible by 8
    if img.shape[1] % 8 == 0 and img.shape[2] % 8 == 0:
        return img
    else:
        # Calculate the closest lower resolution divisible by 8
        new_height = img.shape[1] - (img.shape[1] % 8)
        new_width = img.shape[2] - (img.shape[2] % 8)
        
        # Use CenterCrop to crop the image
        transform = transforms.CenterCrop((new_height, new_width), interpolation=transforms.InterpolationMode.BILINEAR)
        img = transform(img).to(torch.float32).clamp(-1, 1)
        
        return img
        
to_tensor = transforms.ToTensor()

vae = AsymmetricAutoencoderKL.from_pretrained("Heasterian/AsymmetricAutoencoderKLUpscaler", weight_dtype=torch.float32)
vae.requires_grad_(False)

image = load_image(r"/home/heasterian/test/a/F8VlGmCWEAAUVpc (copy).jpeg")

image = crop_image_to_nearest_divisible_by_8(to_tensor(image)).unsqueeze(0)

upscaled_image = vae(image).sample
# Save the reconstructed image
utils.save_image(upscaled_image, "test.png")