AOT-GAN:Image Editing
AOT-GAN (Aggregated Object Transformers GAN) is a generative adversarial network designed for challenging image inpainting tasks (e.g., large occlusions, complex structural gaps). It integrates multi-scale context aggregation and object-aware mechanisms via AOT Blocks, combining dilated convolutions for long-range dependency capture and multi-scale fusion to enhance structural coherence (e.g., facial features, architectural textures). The model employs attention-guided adversarial training to dynamically focus on missing regions, outperforming traditional methods (e.g., DeepFill) on Places2 and CelebA datasets in PSNR/SSIM, especially for high-resolution images. Ideal for photo restoration, film editing, and medical image reconstruction, it balances generation quality and computational efficiency.
Source model
- Input shape: [1x3x512x512],[1x1x512x512]
- Number of parameters: 14.51M
- Model size: 61.29M
- Output shape: 1x3x512x512
The source model can be found here
Performance Reference
Please search model by model name in Model Farm
Inference & Model Conversion
Please search model by model name in Model Farm
License
Source Model: MIT
Deployable Model: APLUX-MODEL-FARM-LICENSE