SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 8 days ago • 118
view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control 25 days ago • 109
LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control Paper • 2406.16038 • Published Jun 23, 2024 • 1
SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model Paper • 2501.15830 • Published Jan 27 • 14
Exploring the Potential of Encoder-free Architectures in 3D LMMs Paper • 2502.09620 • Published 15 days ago • 25