LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects Paper • 2504.19838 • Published 4 days ago • 21
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning Paper • 2504.16080 • Published 10 days ago • 15
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework Paper • 2503.21758 • Published Mar 27 • 21
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis Paper • 2503.21749 • Published Mar 27 • 26
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning Paper • 2503.21620 • Published Mar 27 • 61
Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning Paper • 2503.11646 • Published Mar 14 • 36
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing Paper • 2503.10639 • Published Mar 13 • 50
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency Paper • 2502.09621 • Published Feb 13 • 28
IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models Paper • 2501.13920 • Published Jan 23 • 17
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step Paper • 2501.13926 • Published Jan 23 • 42
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation Paper • 2501.01895 • Published Jan 3 • 56
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM Paper • 2412.09618 • Published Dec 12, 2024 • 21
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding Paper • 2412.09604 • Published Dec 12, 2024 • 38
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices Paper • 2411.10640 • Published Nov 16, 2024 • 47
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation Paper • 2410.13861 • Published Oct 17, 2024 • 57
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow Paper • 2410.07303 • Published Oct 9, 2024 • 18
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code Paper • 2410.08196 • Published Oct 10, 2024 • 47
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines Paper • 2409.12959 • Published Sep 19, 2024 • 38
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation Paper • 2408.15881 • Published Aug 28, 2024 • 22