OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text Paper • 2406.08418 • Published Jun 12, 2024 • 31
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking Paper • 2303.16727 • Published Mar 29, 2023
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models Paper • 2411.13503 • Published Nov 20, 2024 • 35
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling Paper • 2501.00574 • Published Dec 31, 2024 • 6
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling Paper • 2501.12386 • Published Jan 21 • 1
DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency Paper • 2501.10110 • Published Jan 17
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models Paper • 2501.08453 • Published Jan 14 • 1
InternVideo: General Video Foundation Models via Generative and Discriminative Learning Paper • 2212.03191 • Published Dec 6, 2022
ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis Paper • 2103.05630 • Published Mar 9, 2021
Unmasked Teacher: Towards Training-Efficient Video Foundation Models Paper • 2303.16058 • Published Mar 28, 2023
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer Paper • 2211.09552 • Published Nov 17, 2022
Harvest Video Foundation Models via Efficient Post-Pretraining Paper • 2310.19554 • Published Oct 30, 2023
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities Paper • 2401.15071 • Published Jan 26, 2024 • 38
VideoMamba: State Space Model for Efficient Video Understanding Paper • 2403.06977 • Published Mar 11, 2024 • 31
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark Paper • 2311.17005 • Published Nov 28, 2023 • 2
VBench: Comprehensive Benchmark Suite for Video Generative Models Paper • 2311.17982 • Published Nov 29, 2023 • 9
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models Paper • 2309.15103 • Published Sep 26, 2023 • 42
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation Paper • 2307.06942 • Published Jul 13, 2023 • 23
InternChat: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language Paper • 2305.05662 • Published May 9, 2023 • 4