An unified model for multimodal understanding, text-to-image generation, and image editing.
Demo for multimodal understanding and generation