--- license: mit library_name: ultralytics tags: - object-detection - computer-vision - yolov10 - faster-rcnn - pytorch - autonomous-driving - hallucination-mitigation - out-of-distribution - ood-detection - proximal-ood - benchmark-analysis - bdd100k - pascal-voc pipeline_tag: object-detection datasets: - bdd100k - pascal-voc - openimages model-index: - name: m-hood-yolov10-bdd-finetuned results: - task: type: object-detection dataset: type: bdd100k name: BDD 100K metrics: - type: mAP@50-95 value: 0.34 - type: hallucination_reduction_near_ood name: Hallucination Reduction (Near-OoD) value: "79.5%" - name: m-hood-faster-rcnn-bdd-finetuned results: - task: type: object-detection dataset: type: bdd100k name: BDD 100K metrics: - type: mAP@50 value: 0.252 - type: hallucination_reduction_near_ood name: Hallucination Reduction (Near-OoD) value: "84.8%" --- # M-Hood: Models for Mitigating Hallucinations in Object Detection [![Paper](https://img.shields.io/badge/Paper-PDF-red)](https://arxiv.org/pdf/2503.07330) [![Code](https://img.shields.io/badge/Code-GitLab-orange)](https://gricad-gitlab.univ-grenoble-alpes.fr/dnn-safety/m-hood) [![Dataset](https://img.shields.io/badge/Dataset-HuggingFace-blue)](https://huggingface.co/datasets/HugoHE/m-hood-dataset) This repository contains the official models from the paper **"Mitigating Hallucinations in Object Detection Models: A Revisit to Out-of-Distribution Detection"**. It provides a collection of YOLOv10 and Faster R-CNN models, including both standard ("vanilla") baselines and our novel **fine-tuned models** designed to significantly reduce false positive detections on out-of-distribution (OoD) data. Our work first identifies critical flaws in existing OoD benchmarks and then introduces a novel fine-tuning strategy that teaches detectors to be less overconfident. By training on specially curated **"proximal OoD"** samples, our models learn to suppress objectness scores for unfamiliar objects, leading to an **88% reduction in overall hallucination error** on the BDD-100K benchmark when combined with an OoD filter. ## ๐ŸŽฏ Key Features - **Novel Fine-Tuning for Hallucination Mitigation**: Models specifically fine-tuned to reduce false positives on novel objects, enhancing safety and reliability. - **Vanilla vs. Fine-tuned Comparison**: Provides both baseline and improved models to clearly demonstrate the effectiveness of our approach. - **Dual Architecture Support**: Includes both the real-time **YOLOv10** and the high-accuracy **Faster R-CNN**. - **Multi-Dataset Scope**: Models trained on **BDD 100K** for autonomous driving and **Pascal VOC** for general object detection. - **Corrected Benchmark Datasets**: Accompanied by new, carefully curated OoD test sets (**Near-OoD**, **Far-OoD**) that address the flaws in previous benchmarks. ## ๐Ÿ”ฌ The M-Hood Approach: How It Works Object detectors often "hallucinate" and produce high-confidence false predictions when faced with novel objects not seen during training. Our research tackles this in two ways: 1. **Benchmarking Revisited**: We found existing OoD test sets were flawed, with ~13% of supposedly "OoD" images containing in-distribution (ID) objects. This leads to inaccurate performance evaluation. We created new, clean OoD benchmarks for reliable assessment. 2. **Proximal OoD Fine-Tuning**: Our core contribution is a fine-tuning strategy that makes the detector itself more robust. - We create a dataset of **"proximal OoD"** samplesโ€”objects that are semantically similar to the training classes but are not part of them (e.g., using 'deer' as a proximal OoD sample for a model trained on 'horse' and 'cow'). - We fine-tune the original models on a combined dataset of original ID data and these new proximal OoD samples. - During this process, the proximal OoD samples are treated as **background**, effectively teaching the model to suppress its predictions for these and other similar novel objects. The result is a model that is inherently more conservative and less likely to hallucinate, without significantly compromising its performance on the original task. ## ๐Ÿ“Š Performance Highlights Our fine-tuning method drastically reduces the number of hallucinated detections on OoD datasets. The tables below show the number of false predictions (lower is better) on our **Near-OoD** benchmark. #### YOLOv10 on BDD-100K (Near-OoD Hallucination Counts) | Model Configuration | Hallucination Count | Reduction | |---------------------|---------------------|-----------| | Original (Vanilla) | 708 | - | | **Ours (Fine-tuned)** | **145** | **-79.5%**| | Original + KNN Filter | 297 | -58.1% | | **Ours + KNN Filter** | **78** | **-89.0%**| #### Faster R-CNN on BDD-100K (Near-OoD Hallucination Counts) | Model Configuration | Hallucination Count | Reduction | |---------------------|---------------------|-----------| | Original (Vanilla) | 2,595 | - | | **Ours (Fine-tuned)** | **395** | **-84.8%**| | Original + KNN Filter | 1,272 | -51.0% | | **Ours + KNN Filter** | **270** | **-89.6%**| ## ๐Ÿ—‚๏ธ Model Collection ### YOLOv10 Models | Model | Dataset | Training Type | Size | Description | Download | |-------|---------|---------------|------|-------------|----------| | **yolov10-bdd-vanilla.pt** | BDD 100K | Vanilla | 62MB | Baseline real-time model for autonomous driving. | [Download](./yolov10-bdd-vanilla.pt) | | **yolov10-bdd-finetune.pt** | BDD 100K | **Fine-tuned** | 62MB | **Robust** model with reduced OoD hallucinations. | [Download](./yolov10-bdd-finetune.pt) | | **yolov10-voc-vanilla.pt** | Pascal VOC | Vanilla | 63MB | Baseline model for general purpose object detection. | [Download](./yolov10-voc-vanilla.pt) | | **yolov10-voc-finetune.pt** | Pascal VOC | **Fine-tuned** | 94MB | **Robust** general purpose model for OoD scenarios. | [Download](./yolov10-voc-finetune.pt) | ### Faster R-CNN Models | Model | Dataset | Training Type | Size | Description | Download | |-------|---------|---------------|------|-------------|----------| | **faster-rcnn-bdd-vanilla.pth** | BDD 100K | Vanilla | 315MB | High-accuracy baseline for autonomous driving. | [Download](./faster-rcnn-bdd-vanilla.pth) | | **faster-rcnn-bdd-finetune.pth**| BDD 100K | **Fine-tuned** | 158MB | **Robust** high-accuracy model for OoD scenarios. | [Download](./faster-rcnn-bdd-finetune.pth) | | **faster-rcnn-voc-vanilla.pth** | Pascal VOC | Vanilla | 315MB | High-accuracy baseline for general object detection. | [Download](./faster-rcnn-voc-vanilla.pth) | | **faster-rcnn-voc-finetune.pth**| Pascal VOC | **Fine-tuned** | 158MB | **Robust** high-accuracy general purpose model. | [Download](./faster-rcnn-voc-finetune.pth) | *(Note: The KITTI models mentioned in the old README are not explicitly detailed in the provided paper text, so I have focused on BDD and VOC, which are central to the experiments shown.)* ## ๐Ÿš€ Quick Start ### YOLOv10 Usage ```python from ultralytics import YOLO # Load our robust, fine-tuned YOLOv10 model model = YOLO('yolov10-bdd-finetune.pt') # Run inference results = model('path/to/your/image.jpg') # Process results for result in results: boxes = result.boxes.xyxy # bounding boxes scores = result.boxes.conf # confidence scores classes = result.boxes.cls # class predictions ``` ### Faster R-CNN Usage ```python import torch import torchvision from torchvision.models.detection import fasterrcnn_resnet50_fpn # NOTE: The provided .pth files are state_dicts. # You need to load them into a model instance. # Example for a vanilla VOC model: num_classes = 21 # 20 classes + background model = fasterrcnn_resnet50_fpn(weights=None, num_classes=num_classes) model.load_state_dict(torch.load('faster-rcnn-voc-vanilla.pth')) model.eval() # Run inference on a pre-processed image tensor with torch.no_grad(): predictions = model(image_tensor) # Process results boxes = predictions[0]['boxes'] scores = predictions[0]['scores'] labels = predictions[0]['labels'] ``` ## ๐Ÿ“„ Citation If you use our models, datasets, or methodology in your research, please cite our paper: ```bibtex @inproceedings{he2025mitigating, title={{Mitigating Hallucinations in Object Detection Models: A Revisit to Out-of-Distribution Detection}}, author={He, Weicheng and Wu, Changshun and Cheng, Chih-Hong and Huang, Xiaowei and Bensalem, Saddek}, booktitle={To Be Published}, year={2025} } ``` Please also consider citing the original works for the model architectures and datasets used. ## ๐Ÿ“œ License This work is released under the MIT License.