VoxRep: Enhancing 3D Spatial Understanding in 2D Vision-Language Models via Voxel Representation Paper β’ 2503.21214 β’ Published 29 days ago β’ 1