Safetensors
qwen2_5_vl
RoboFAC-7B / README.md
Idphilosea's picture
Upload README.md
3671fa6 verified
|
raw
history blame
2.69 kB

Model Card for RoboFAC-7B

Project Page Paper Dataset Model RoboFAC-7B is a large-scale vision-language model specifically finetuned for robotic failure understanding and correction. It takes in visual observations of robot executions (usually video frames) and outputs detailed answers to questions that analyze, diagnose, and propose corrections for robotic manipulation failures.

Model Details

Model Description

  • Developed by: MINT Lab, Shanghai Jiao Tong University
  • Model type: Vision-Language Model (VLM) for robotic failure analysis
  • Languages: English (instruction-tuned for robotic QA)
  • License: Apache 2.0
  • Finetuned from model: Qwen/Qwen2.5-VL-7B-Instruct

Uses

Direct Use

The model is intended to be used in robotic systems as an external critic, to:

  • Perform task understanding by answering what the robot is doing.
  • Conduct failure diagnosis by identifying where and why it failed.
  • Generate correction suggestions based on visual observations.

Downstream Use

The model can be integrated into:

  • Vision-language control pipelines (e.g., VLA systems)
  • Robotic operation monitoring tools
  • Training agents with self-improvement capabilities

Quickstart

from transformers import AutoProcessor, AutoModelForVision2Seq

model = AutoModelForVision2Seq.from_pretrained("MINT-SJTU/RoboFAC-7B")
processor = AutoProcessor.from_pretrained("MINT-SJTU/RoboFAC-7B")

# Example usage with image frames and a question
inputs = processor(images=[...], text="Why did the robot fail?", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs)
print(processor.batch_decode(outputs, skip_special_tokens=True))

Citation

BibTeX:

@misc{lu2025robofaccomprehensiveframeworkrobotic,
  title={RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and Correction},
  author={Weifeng Lu and Minghao Ye and Zewei Ye and Ruihan Tao and Shuo Yang and Bo Zhao},
  year={2025},
  eprint={2505.12224},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2505.12224}
}