Robotics
Safetensors

πŸ’ͺ RIPT-VLA: Interactive Post-Training for Vision-Language-Action Models (arxiv.org/abs/2505.17016)

Authors: Shuhan Tan, Kairan Dou, Yue Zhao, Philipp KrΓ€henbΓΌhl
Codebase: GitHub – RIPT-VLA
Website: Project Page

RIPT-VLA enables interactive post-training for any pretrained Vision-Language-Action (VLA) model using only sparse binary success rewards.
With K-rollout interaction, dynamic sampling, and leave-one-out advantage estimation, RIPT-VLA achieves state-of-the-art performance in extremely low-data regimes.


🧠 Model Summary

RIPT-VLA takes a pretrained VLA model (e.g., QueST or OpenVLA-OFT) and improves its performance by fine-tuning it with reinforcement learning based on success/failure signals only β€” no dense rewards or value functions required.

Supported models:

  • βœ… QueST (small, efficient)
  • βœ… OpenVLA-OFT (large-scale, high-capacity)

πŸ§ͺ Model Use

βœ… Intended Use

  • Research on post-training VLA models via RL
  • Evaluation on LIBERO benchmarks (LIBERO-90, Goal, Object, Spatial, Long)
  • Studying low-data reinforcement learning settings

πŸ“¦ Checkpoints

All checkpoints are hosted here in this repository.

βœ”οΈ QueST Checkpoints

Suite SFT Checkpoint RIPT Checkpoint
LIBERO-90 βœ… βœ…
LIBERO-GOAL βœ… βœ…
LIBERO-LONG βœ… βœ…
LIBERO-OBJECT βœ… βœ…
LIBERO-SPATIAL βœ… βœ…

Each QueST checkpoint is ~80MB.

βœ”οΈ OpenVLA-OFT Checkpoints

Suite SFT Scale Head RIPT LoRA Adaptor
LIBERO-GOAL βœ… βœ…
LIBERO-LONG βœ… βœ…
LIBERO-OBJECT βœ… βœ…
LIBERO-SPATIAL βœ… βœ…

OpenVLA-OFT scale heads are ~300MB; RIPT LoRA adaptors are ~1GB.


πŸ›  How to Use

For usage, see INSTALL.md in the main GitHub repo.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Model tree for tanshh97/RIPT_VLA

Finetuned
(1)
this model

Dataset used to train tanshh97/RIPT_VLA