OmniGen2: Exploration to Advanced Multimodal Generation Paper • 2506.18871 • Published 2 days ago • 64
When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding Paper • 2506.05551 • Published 20 days ago • 4
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion Paper • 2506.01111 • Published 24 days ago • 29
EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models Paper • 2506.01667 • Published 24 days ago • 21
Temporal Regularization Makes Your Video Generator Stronger Paper • 2503.15417 • Published Mar 19 • 22
Temporal Regularization Makes Your Video Generator Stronger Paper • 2503.15417 • Published Mar 19 • 22
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization Paper • 2503.08619 • Published Mar 11 • 20
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization Paper • 2503.08619 • Published Mar 11 • 20