WorldSimBench: Towards Video Generation Models as World Simulators Paper • 2410.18072 • Published Oct 23, 2024 • 18
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models Paper • 2408.04840 • Published Aug 9, 2024 • 34
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture Paper • 2405.18991 • Published May 29, 2024 • 12
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception Paper • 2401.16158 • Published Jan 29, 2024 • 19
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 259
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration Paper • 2311.04257 • Published Nov 7, 2023 • 21
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation Paper • 2306.07954 • Published Jun 13, 2023 • 112