High quality image generation in 3 second
Generate realistic voice synthesis using text and reference audio