VLA Models - a sikang99 Collection

sikang99 's Collections

Diffusion Models

Diffusion Model

Reinforcement Learning

Vision Processing

Video Generation

VLA Models

updated 3 days ago

Vision Language Models for Robotics

Unified Vision-Language-Action Model

Paper • 2506.19850 • Published 24 days ago • 25
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2 • 114
3D-VLA: A 3D Vision-Language-Action Generative World Model

Paper • 2403.09631 • Published Mar 14, 2024 • 10
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots

Paper • 2312.14457 • Published Dec 22, 2023 • 1
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression

Paper • 2412.03293 • Published Dec 4, 2024
Bi-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic Dexterous Manipulations

Paper • 2405.06039 • Published May 9, 2024 • 1
A Dual Process VLA: Efficient Robotic Manipulation Leveraging VLM

Paper • 2410.15549 • Published Oct 21, 2024
What Can RL Bring to VLA Generalization? An Empirical Study

Paper • 2505.19789 • Published May 26
VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulation

Paper • 2502.02175 • Published Feb 4
VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models

Paper • 2506.17561 • Published 27 days ago
RaceVLA: VLA-based Racing Drone Navigation with Human-like Behaviour

Paper • 2503.02572 • Published Mar 4
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning

Paper • 2505.18719 • Published May 24
RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

Paper • 2506.18088 • Published 26 days ago • 17
WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published 22 days ago • 39
RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration

Paper • 2505.03673 • Published May 6 • 1
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete

Paper • 2502.21257 • Published Feb 28 • 2
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Paper • 2507.01925 • Published 16 days ago • 32
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse

Paper • 2503.16365 • Published Mar 20 • 41
VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers

Paper • 2507.01016 • Published 17 days ago • 1
CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding

Paper • 2506.13725 • Published Jun 16
UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent

Paper • 2501.18867 • Published Jan 31
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation

Paper • 2505.03912 • Published May 6 • 9
ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation

Paper • 2505.22159 • Published May 28
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

Paper • 2507.04447 • Published 12 days ago • 40
RoboBrain 2.0 Technical Report

Paper • 2507.02029 • Published 16 days ago • 27
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model

Paper • 2503.10631 • Published Mar 13