SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2 • 114
3D-VLA: A 3D Vision-Language-Action Generative World Model Paper • 2403.09631 • Published Mar 14, 2024 • 10
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots Paper • 2312.14457 • Published Dec 22, 2023 • 1
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression Paper • 2412.03293 • Published Dec 4, 2024
Bi-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic Dexterous Manipulations Paper • 2405.06039 • Published May 9, 2024 • 1
A Dual Process VLA: Efficient Robotic Manipulation Leveraging VLM Paper • 2410.15549 • Published Oct 21, 2024
VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulation Paper • 2502.02175 • Published Feb 4
VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models Paper • 2506.17561 • Published 27 days ago
RaceVLA: VLA-based Racing Drone Navigation with Human-like Behaviour Paper • 2503.02572 • Published Mar 4
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning Paper • 2505.18719 • Published May 24
RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation Paper • 2506.18088 • Published 26 days ago • 17
RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration Paper • 2505.03673 • Published May 6 • 1
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete Paper • 2502.21257 • Published Feb 28 • 2
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective Paper • 2507.01925 • Published 16 days ago • 32
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse Paper • 2503.16365 • Published Mar 20 • 41
VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers Paper • 2507.01016 • Published 17 days ago • 1
CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding Paper • 2506.13725 • Published Jun 16
UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent Paper • 2501.18867 • Published Jan 31
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation Paper • 2505.03912 • Published May 6 • 9
ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation Paper • 2505.22159 • Published May 28
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge Paper • 2507.04447 • Published 12 days ago • 40
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model Paper • 2503.10631 • Published Mar 13