MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents Paper • 2408.04203 • Published Aug 8, 2024 • 1
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning Paper • 2505.16933 • Published May 22 • 32
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning Paper • 2505.14231 • Published May 20 • 51