CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts Paper • 2405.05949 • Published May 9, 2024 • 3
Multi-Reward as Condition for Instruction-based Image Editing Paper • 2411.04713 • Published Nov 6, 2024 • 1
Where do Large Vision-Language Models Look at when Answering Questions? Paper • 2503.13891 • Published Mar 18 • 8
Vidi: Large Multimodal Models for Video Understanding and Editing Paper • 2504.15681 • Published 20 days ago • 15
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing Paper • 2505.02370 • Published 7 days ago • 12
Vidi: Large Multimodal Models for Video Understanding and Editing Paper • 2504.15681 • Published 20 days ago • 15