Submitted by akhaliq 83 Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs · 8 authors 3
Submitted by akhaliq 35 MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators · 9 authors 2
Submitted by akhaliq 27 SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing · 10 authors
Submitted by akhaliq 27 ByteEdit: Boost, Comply and Accelerate Generative Image Editing · 14 authors 1
Submitted by akhaliq 24 BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion · 5 authors
Submitted by akhaliq 23 MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding · 8 authors
Submitted by akhaliq 18 PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations · 11 authors
Submitted by akhaliq 15 MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation · 6 authors 2
Submitted by akhaliq 13 Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models · 5 authors