Papers
arxiv:2506.18095

ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

Published on Jun 22
ยท Submitted by jymcc on Jun 26
#1 Paper of the day
Authors:
,
,
,
,
,
,

Abstract

ShareGPT-4o-Image and Janus-4o enable open research in photorealistic, instruction-aligned image generation through a large dataset and multimodal model.

AI-generated summary

Recent advances in multimodal generative models have unlocked photorealistic, instruction-aligned image generation, yet leading systems like GPT-4o-Image remain proprietary and inaccessible. To democratize these capabilities, we present ShareGPT-4o-Image, the first dataset comprising 45K text-to-image and 46K text-and-image-to-image data, all synthesized using GPT-4o's image generation capabilities for distilling its advanced image generation abilities. Leveraging this dataset, we develop Janus-4o, a multimodal large language model capable of both text-to-image and text-and-image-to-image generation. Janus-4o not only significantly improves text-to-image generation over its predecessor, Janus-Pro, but also newly supports text-and-image-to-image generation. Notably, it achieves impressive performance in text-and-image-to-image generation from scratch, using only 91K synthetic samples and 6 hours of training on an 8 A800-GPU machine. We hope the release of ShareGPT-4o-Image and Janus-4o will foster open research in photorealistic, instruction-aligned image generation.

Community

Paper submitter

Excited to share our latest work: ShareGPT-4o-Image ๐ŸŽ‰

  • We introduce ShareGPT-4o-Image: a massive dataset of GPT-4o synthesized images for aligning multimodal models to GPT-4o's image generation capabilities. It covers both text-to-image and text-and-image-to-image tasks. ๐Ÿ–ผ๏ธ
  • Accompanying this is Janus-4o, a unified multimodal LLM excelling in both text-to-image and image-to-text-to-image generation. ๐Ÿš€
  • Remarkably, ShareGPT-4o-Image significantly boosts image generation capabilities, requiring only 6 hours of training on an A800 machine. โšก๏ธ

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.18095 in a Space README.md to link it from this page.

Collections including this paper 1