AOT-GAN:Image Editing

AOT-GAN (Aggregated Object Transformers GAN) is a generative adversarial network designed for challenging image inpainting tasks (e.g., large occlusions, complex structural gaps). It integrates multi-scale context aggregation and object-aware mechanisms via AOT Blocks, combining dilated convolutions for long-range dependency capture and multi-scale fusion to enhance structural coherence (e.g., facial features, architectural textures). The model employs attention-guided adversarial training to dynamically focus on missing regions, outperforming traditional methods (e.g., DeepFill) on Places2 and CelebA datasets in PSNR/SSIM, especially for high-resolution images. Ideal for photo restoration, film editing, and medical image reconstruction, it balances generation quality and computational efficiency.

Source model

  • Input shape: [1x3x512x512],[1x1x512x512]
  • Number of parameters: 14.51M
  • Model size: 61.29M
  • Output shape: 1x3x512x512

The source model can be found here

Performance Reference

Please search model by model name in Model Farm

Inference & Model Conversion

Please search model by model name in Model Farm

License

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support