MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding Paper • 2406.04264 • Published Jun 6, 2024 • 2
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding Paper • 2409.14485 • Published Sep 22, 2024 • 2
Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding Paper • 2503.18478 • Published Mar 24 • 1