lerobot/smolvla_base · Hugging Face

SmolVLA: A vision-language-action model for affordable and efficient robotics

Resources and technical documentation:

This model has 450M parameters in total. You can use inside the LeRobot library.

Before proceeding to the next steps, you need to properly install the environment by following Installation Guide on the docs.

Install smolvla extra dependencies:

pip install -e ".[smolvla]"

Example of finetuning the smolvla pretrained model (smolvla_base):

python lerobot/scripts/train.py \
  --policy.path=lerobot/smolvla_base \
  --dataset.repo_id=lerobot/svla_so101_pickplace \
  --batch_size=64 \
  --steps=20000 \
  --output_dir=outputs/train/my_smolvla \
  --job_name=my_smolvla_training \
  --policy.device=cuda \
  --wandb.enable=true

Example of finetuning the smolvla neural network with pretrained VLM and action expert intialized from scratch:

python lerobot/scripts/train.py \
  --dataset.repo_id=lerobot/svla_so101_pickplace \
  --batch_size=64 \
  --steps=200000 \
  --output_dir=outputs/train/my_smolvla \
  --job_name=my_smolvla_training \
  --policy.device=cuda \
  --wandb.enable=true

lerobot
/

smolvla_base

SmolVLA: A vision-language-action model for affordable and efficient robotics

Model tree for lerobot/smolvla_base

Dataset used to train lerobot/smolvla_base

Collection including lerobot/smolvla_base

SmolVLA