Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better Paper • 2506.09040 • Published Jun 10 • 36
M3GIA: A Cognition Inspired Multilingual and Multimodal General Intelligence Ability Benchmark Paper • 2406.05343 • Published Jun 8, 2024
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies Paper • 2503.14324 • Published Mar 18 • 2