VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping Paper • 2412.11279 • Published 10 days ago • 12
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM Paper • 2412.09618 • Published 13 days ago • 21
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published Jun 10 • 65
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3 • 65
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding Paper • 2403.15377 • Published Mar 22 • 22
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding Paper • 2403.09626 • Published Mar 14 • 13
VideoMamba: State Space Model for Efficient Video Understanding Paper • 2403.06977 • Published Mar 11 • 27
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 243
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation Paper • 2307.06942 • Published Jul 13, 2023 • 22