aplux/AOT-GAN · Hugging Face

AOT-GAN：Image Editing

AOT-GAN (Aggregated Object Transformers GAN) is a generative adversarial network designed for challenging image inpainting tasks (e.g., large occlusions, complex structural gaps). It integrates multi-scale context aggregation and object-aware mechanisms via AOT Blocks, combining dilated convolutions for long-range dependency capture and multi-scale fusion to enhance structural coherence (e.g., facial features, architectural textures). The model employs attention-guided adversarial training to dynamically focus on missing regions, outperforming traditional methods (e.g., DeepFill) on Places2 and CelebA datasets in PSNR/SSIM, especially for high-resolution images. Ideal for photo restoration, film editing, and medical image reconstruction, it balances generation quality and computational efficiency.

Source model

Input shape: [1x3x512x512],[1x1x512x512]
Number of parameters: 14.51M
Model size: 61.29M
Output shape: 1x3x512x512

The source model can be found here

Performance Reference

Please search model by model name in Model Farm

Inference & Model Conversion

Please search model by model name in Model Farm

License

Source Model: MIT
Deployable Model: APLUX-MODEL-FARM-LICENSE