Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations Paper • 2506.18898 • Published Jun 23 • 33
When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding Paper • 2506.05551 • Published Jun 5 • 5
view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • 28 days ago • 606
Rope to Nope and Back Again: A New Hybrid Attention Strategy Paper • 2501.18795 • Published Jan 30 • 6