Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone Paper • 2512.22615 • Published 3 days ago • 26
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation Paper • 2512.23705 • Published 1 day ago • 32
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization Paper • 2511.15705 • Published Nov 19 • 93
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe Paper • 2511.16334 • Published Nov 20 • 92
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models Paper • 2512.19995 • Published 8 days ago • 14
CASA Collection CASA: Cross-Attention as Self-Attention for Efficient Vision-Language Fusion on long context streaming inputs • 6 items • Updated 7 days ago • 6
ARC-Encoders Collection Pretrained ARC-Encoders and a fine-tuning dataset: context compression for unmodified LLMs. • 7 items • Updated 6 days ago • 4