RLinf-logo

RLinf: Reinforcement Learning Infrastructure for Agentic AI

RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning. The 'inf' in RLinf stands for Infrastructure, highlighting its role as a robust backbone for next-generation training. It also stands for Infinite, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.

RLinf-overview

Model Description

The RLinf-openvlaoft-libero series is trained on Haozhan72/Openvla-oft-SFT-libero-xxx-traj1 (including libero10, libero-object, libero-goal and libero-spatial), using the same base models and training datasets as verl. Training with RLinf yields SOTA performance.

We use a mask to focus on valid action tokens, and compute token-level loss based on the Group Relative Policy Optimization (GRPO) advantage function, in order to enhance the model’s performance on spatial reasoning, object generalization, instruction generalization, and long-horizon tasks.

Evaluation and Results

We trained and evaluated four models using RLinf:

Benchmark Results

All sft models are from SimpleVLA-RL.

  • Recommended sampleing setting for evaluation: libero seed=0; episode number=500; do_sample=False
Model Object Spatial Goal Long Average
sft models 25.60 56.45 45.59 9.68 34.33
trained with RLinf 98.99 98.99 98.99 94.35 97.83
RLinf-libero-result

How to Use

Please integrate the provided model with the RLinf codebase. To do so, modify the following parameters in the configuration file examples/embodiment/config/libero_spatial_grpo_openvlaoft.yaml:

  • Set actor.checkpoint_load_path, actor.tokenizer.tokenizer_model, and rollout.model_dir to the path of the model checkpoint.

Note: If you intend to evaluate the model directly, make sure to set actor.model.is_lora to false.

License

This code repository and the model weights are licensed under the MIT License.

Downloads last month
5
Safetensors
Model size
7.54B params
Tensor type
BF16
·
Video Preview
loading

Model tree for RLinf/RLinf-OpenVLAOFT-GRPO-LIBERO-spatial

Finetuned
(1)
this model

Evaluation results