πͺ RIPT-VLA: Interactive Post-Training for Vision-Language-Action Models (arxiv.org/abs/2505.17016)
Authors: Shuhan Tan, Kairan Dou, Yue Zhao, Philipp KrΓ€henbΓΌhl
Codebase: GitHub β RIPT-VLA
Website: Project Page
RIPT-VLA enables interactive post-training for any pretrained Vision-Language-Action (VLA) model using only sparse binary success rewards.
With K-rollout interaction, dynamic sampling, and leave-one-out advantage estimation, RIPT-VLA achieves state-of-the-art performance in extremely low-data regimes.
π§ Model Summary
RIPT-VLA takes a pretrained VLA model (e.g., QueST or OpenVLA-OFT) and improves its performance by fine-tuning it with reinforcement learning based on success/failure signals only β no dense rewards or value functions required.
Supported models:
- β QueST (small, efficient)
- β OpenVLA-OFT (large-scale, high-capacity)
π§ͺ Model Use
β Intended Use
- Research on post-training VLA models via RL
- Evaluation on LIBERO benchmarks (LIBERO-90, Goal, Object, Spatial, Long)
- Studying low-data reinforcement learning settings
π¦ Checkpoints
All checkpoints are hosted here in this repository.
βοΈ QueST Checkpoints
Suite | SFT Checkpoint | RIPT Checkpoint |
---|---|---|
LIBERO-90 | β | β |
LIBERO-GOAL | β | β |
LIBERO-LONG | β | β |
LIBERO-OBJECT | β | β |
LIBERO-SPATIAL | β | β |
Each QueST checkpoint is ~80MB.
βοΈ OpenVLA-OFT Checkpoints
Suite | SFT Scale Head | RIPT LoRA Adaptor |
---|---|---|
LIBERO-GOAL | β | β |
LIBERO-LONG | β | β |
LIBERO-OBJECT | β | β |
LIBERO-SPATIAL | β | β |
OpenVLA-OFT scale heads are ~300MB; RIPT LoRA adaptors are ~1GB.
π How to Use
For usage, see INSTALL.md in the main GitHub repo.
Model tree for tanshh97/RIPT_VLA
Base model
moojink/openvla-7b-oft-finetuned-libero-10