Qwen2-VL Collection Vision-language model series based on Qwen2 • 16 items • Updated Dec 6, 2024 • 208
EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion Paper • 2501.13452 • Published Jan 23 • 7
Temporal Preference Optimization for Long-Form Video Understanding Paper • 2501.13919 • Published Jan 23 • 22
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Paper • 2501.13826 • Published Jan 23 • 24