When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding Paper • 2506.05551 • Published Jun 5 • 5 • 2
EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models Paper • 2506.01667 • Published Jun 2 • 21 • 2
VidText: Towards Comprehensive Evaluation for Video Text Understanding Paper • 2505.22810 • Published May 28 • 20 • 2