Safetensors
qwen2_5_vl

Model Card for RoboFAC-7B

Project Page Paper Dataset Model RoboFAC-7B is a large-scale vision-language model specifically finetuned for robotic failure understanding and correction. It takes in visual observations of robot executions (usually video frames) and outputs detailed answers to questions that analyze, diagnose, and propose corrections for robotic manipulation failures.

Model Details

Model Description

  • Developed by: MINT Lab, Shanghai Jiao Tong University
  • Model type: Vision-Language Model (VLM) for robotic failure analysis
  • Languages: English (instruction-tuned for robotic QA)
  • License: Apache 2.0
  • Finetuned from model: Qwen/Qwen2.5-VL-7B-Instruct

Uses

Direct Use

The model is intended to be used in robotic systems as an external critic, to:

  • Perform task understanding by answering what the robot is doing.
  • Conduct failure diagnosis by identifying where and why it failed.
  • Generate correction suggestions based on visual observations.

Downstream Use

The model can be integrated into:

  • Vision-language control pipelines (e.g., VLA systems)
  • Robotic operation monitoring tools
  • Training agents with self-improvement capabilities

Quickstart

from transformers import AutoProcessor, AutoModelForVision2Seq

model = AutoModelForVision2Seq.from_pretrained("MINT-SJTU/RoboFAC-7B")
processor = AutoProcessor.from_pretrained("MINT-SJTU/RoboFAC-7B")

# Example usage with image frames and a question
inputs = processor(images=[...], text="Why did the robot fail?", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs)
print(processor.batch_decode(outputs, skip_special_tokens=True))

Citation

BibTeX:

@misc{lu2025robofaccomprehensiveframeworkrobotic,
  title={RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and Correction},
  author={Weifeng Lu and Minghao Ye and Zewei Ye and Ruihan Tao and Shuo Yang and Bo Zhao},
  year={2025},
  eprint={2505.12224},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2505.12224}
}
Downloads last month
27
Safetensors
Model size
8.29B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MINT-SJTU/RoboFAC-7B

Finetuned
(524)
this model

Dataset used to train MINT-SJTU/RoboFAC-7B