OmniGen2: Exploration to Advanced Multimodal Generation Paper • 2506.18871 • Published 3 days ago • 64
When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding Paper • 2506.05551 • Published 20 days ago • 4
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion Paper • 2506.01111 • Published 24 days ago • 29
EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models Paper • 2506.01667 • Published 24 days ago • 21
Temporal Regularization Makes Your Video Generator Stronger Paper • 2503.15417 • Published Mar 19 • 22
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization Paper • 2503.08619 • Published Mar 11 • 20
OmniCreator: Self-Supervised Unified Generation with Universal Editing Paper • 2412.02114 • Published Dec 3, 2024 • 14
VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation Paper • 2412.02259 • Published Dec 3, 2024 • 60