Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
zzfive 's Collections
VLA
RolePlaying
dLLM
industry
RAG
ssm
safety
inference optimization
RL+reason model
digital-human
benchmark
medical
3d
image
LLMs
video
agent
cv
Infrastructure
datasets
multimodal
audio
robot

VLA

updated 2 days ago
Upvote
-

  • A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

    Paper • 2507.01925 • Published Jul 2 • 35

  • DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

    Paper • 2507.04447 • Published Jul 6 • 42

  • A Survey on Vision-Language-Action Models for Autonomous Driving

    Paper • 2506.24044 • Published Jun 30 • 14

  • EmbRACE-3K: Embodied Reasoning and Action in Complex Environments

    Paper • 2507.10548 • Published 25 days ago • 33

  • Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos

    Paper • 2507.15597 • Published 18 days ago • 33

  • ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

    Paper • 2507.16815 • Published 17 days ago • 35

  • villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models

    Paper • 2507.23682 • Published 8 days ago • 22

  • InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation

    Paper • 2507.17520 • Published 16 days ago • 12
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs