|
--- |
|
datasets: |
|
- MINT-SJTU/RoboFAC-dataset |
|
base_model: |
|
- Qwen/Qwen2.5-VL-7B-Instruct |
|
--- |
|
|
|
# Model Card for RoboFAC-7B |
|
[](https://mint-sjtu.github.io/RoboFAC.io/) [](https://arxiv.org/abs/2505.12224) [](https://huggingface.co/datasets/MINT-SJTU/RoboFAC-dataset) [](https://huggingface.co/MINT-SJTU/RoboFAC-7B) |
|
RoboFAC-7B is a large-scale vision-language model specifically finetuned for **robotic failure understanding and correction**. It takes in visual observations of robot executions (usually video frames) and outputs detailed answers to questions that analyze, diagnose, and propose corrections for robotic manipulation failures. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
* **Developed by:** [MINT Lab, Shanghai Jiao Tong University](https://mint-sjtu.github.io/) |
|
* **Model type:** Vision-Language Model (VLM) for robotic failure analysis |
|
* **Languages:** English (instruction-tuned for robotic QA) |
|
* **License:** Apache 2.0 |
|
* **Finetuned from model:** Qwen/Qwen2.5-VL-7B-Instruct |
|
|
|
|
|
--- |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
The model is intended to be used in robotic systems as an *external critic*, to: |
|
|
|
* Perform **task understanding** by answering what the robot is doing. |
|
* Conduct **failure diagnosis** by identifying where and why it failed. |
|
* Generate **correction suggestions** based on visual observations. |
|
|
|
### Downstream Use |
|
|
|
The model can be integrated into: |
|
|
|
* Vision-language control pipelines (e.g., VLA systems) |
|
* Robotic operation monitoring tools |
|
* Training agents with self-improvement capabilities |
|
--- |
|
|
|
## Quickstart |
|
|
|
```python |
|
from transformers import AutoProcessor, AutoModelForVision2Seq |
|
|
|
model = AutoModelForVision2Seq.from_pretrained("MINT-SJTU/RoboFAC-7B") |
|
processor = AutoProcessor.from_pretrained("MINT-SJTU/RoboFAC-7B") |
|
|
|
# Example usage with image frames and a question |
|
inputs = processor(images=[...], text="Why did the robot fail?", return_tensors="pt").to("cuda") |
|
outputs = model.generate(**inputs) |
|
print(processor.batch_decode(outputs, skip_special_tokens=True)) |
|
``` |
|
|
|
|
|
## Citation |
|
|
|
**BibTeX:** |
|
|
|
```bibtex |
|
@misc{lu2025robofaccomprehensiveframeworkrobotic, |
|
title={RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and Correction}, |
|
author={Weifeng Lu and Minghao Ye and Zewei Ye and Ruihan Tao and Shuo Yang and Bo Zhao}, |
|
year={2025}, |
|
eprint={2505.12224}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.RO}, |
|
url={https://arxiv.org/abs/2505.12224} |
|
} |
|
``` |
|
|
|
|