JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration

Model Description

JarvisIR is a novel system that leverages a Vision-Language Model (VLM) to intelligently restore images for autonomous driving perception in adverse weather. It acts as a central controller, dynamically coordinating multiple expert restoration models to tackle complex degradations such as rain, fog, low-light, and snow.

Key Features

  • VLM Controller: The first framework to employ a Vision-Language Model for orchestrating image restoration workflows.
  • Multi-Expert Coordination: Dynamically schedules specialized restoration models for tasks like denoising, super-resolution, and deraining.
  • Adaptive Restoration: Effectively handles a wide range of adverse weather conditions, including night/low-light, rain, fog, and snow.
  • Advanced Training Strategy: Utilizes a two-stage process of Supervised Fine-Tuning (SFT) followed by alignment with Mixed-Rank Reward-based Human Feedback (MRRHF).

Model Architecture

The system comprises three core components:

  1. VLM Controller: A LLaVA-v1.5-7B model serves as the core for task planning and expert model selection.
  2. Expert Models: A suite of specialized networks, each tailored for a specific restoration task (e.g., deraining, defogging).
  3. Reward Models: A set of Image Quality Assessment (IQA) models that provide feedback for quality assessment and alignment during training.

Training Data

JarvisIR was trained on a large-scale, comprehensive dataset:

  • CleanBench-Synthetic: A dataset of 150,000 synthetically degraded images with corresponding annotations.
  • CleanBench-Real: A collection of 80,000 real-world images captured in adverse weather, used for alignment training.
  • Comprehensive Coverage: The data covers four primary weather scenarios (night, rain, fog, snow) with various combinations of degradations.

Performance

  • Achieves a 50% average improvement in perception metrics on the CleanBench-Real dataset compared to state-of-the-art all-in-one methods.
  • Demonstrates superior performance across all tested weather conditions.
  • Exhibits enhanced robustness and generalization capabilities in real-world driving scenarios.

Intended Use

Primary Use Cases:

  • Enhancing perception systems in autonomous vehicles.
  • Building robust, multi-weather image restoration pipelines.
  • Advancing research into the applications of Vision-Language Models in image processing.

Model Checkpoints

This repository provides the following model weights:

  • pertained: The complete model after both Supervised Fine-Tuning and MRRHF alignment stages.
  • agent-tools/: The weights for each individual expert restoration model.

Citation

If you find JarvisIR useful in your research, please cite our paper:

@inproceedings{lin2025jarvisir,
  title={Jarvisir: Elevating autonomous driving perception with intelligent image restoration},
  author={Lin, Yunlong and Lin, Zixu and Chen, Haoyu and Pan, Panwang and Li, Chenxin and Chen, Sixiang and Wen, Kairun and Jin, Yeying and Li, Wenbo and Ding, Xinghao},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={22369--22380},
  year={2025}
}

Related Resources

Acknowledgments

This work contributes to the advancement of intelligent image restoration by integrating Vision-Language Models with expert system coordination.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using LYL1015/JarvisIR 1