SmolVLA: A vision-language-action model for affordable and efficient robotics

Paper

Designed by Hugging Face.

This model has 450M parameters in total. You can use inside the LeRobot library.

Install smolvla extra dependencies:

pip install -e ".[smolvla]"

Example of finetuning the smolvla pretrained model (smolvla_base):

python lerobot/scripts/train.py \
--policy.path=lerobot/smolvla_base \
--dataset.repo_id=danaaubakirova/svla_so100_task1_v3 \
--batch_size=64 \
--steps=200000

Example of finetuning the smolvla neural network with pretrained VLM and action expert intialized from scratch:

python lerobot/scripts/train.py \
--policy.type=smolvla \
--dataset.repo_id=danaaubakirova/svla_so100_task1_v3 \
--batch_size=64 \
--steps=200000

Example of using the smolvla pretrained model outside LeRobot training framework:

policy = SmolVLAPolicy.from_pretrained("lerobot/smolvla_base")
Downloads last month
552
Safetensors
Model size
450M params
Tensor type
F32
·
BF16
·
Video Preview
loading

Collection including lerobot/smolvla_base