Post
1420
emerging trend: models that can understand image + text and generate image + text
don't miss out ⤵️
> MMaDA: single 8B diffusion model aligned with CoT (reasoning!) + UniGRPO Gen-Verse/MMaDA
> BAGEL: 7B MoT model based on Qwen2.5, SigLIP-so-400M, Flux VAE ByteDance-Seed/BAGEL
both by ByteDance! 😱
I keep track of all any input → any output models here merve/any-to-any-models-6822042ee8eb7fb5e38f9b62
don't miss out ⤵️
> MMaDA: single 8B diffusion model aligned with CoT (reasoning!) + UniGRPO Gen-Verse/MMaDA
> BAGEL: 7B MoT model based on Qwen2.5, SigLIP-so-400M, Flux VAE ByteDance-Seed/BAGEL
both by ByteDance! 😱
I keep track of all any input → any output models here merve/any-to-any-models-6822042ee8eb7fb5e38f9b62