EVLM: An Efficient Vision-Language Model for Visual Understanding Paper • 2407.14177 • Published Jul 19 • 42
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control Paper • 2407.12781 • Published Jul 17 • 12
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos Paper • 2407.12679 • Published Jul 17 • 7