---
license: mit
library_name: ultralytics
tags:
- object-detection
- computer-vision
- yolov10
- faster-rcnn
- pytorch
- autonomous-driving
- hallucination-mitigation
- out-of-distribution
- ood-detection
- proximal-ood
- benchmark-analysis
- bdd100k
- pascal-voc
pipeline_tag: object-detection
datasets:
- bdd100k
- pascal-voc
- openimages
model-index:
- name: m-hood-yolov10-bdd-finetuned
  results:
  - task:
      type: object-detection
    dataset:
      type: bdd100k
      name: BDD 100K
    metrics:
    - type: mAP@50-95
      value: 0.34
    - type: hallucination_reduction_near_ood
      name: Hallucination Reduction (Near-OoD)
      value: "79.5%"
- name: m-hood-faster-rcnn-bdd-finetuned
  results:
  - task:
      type: object-detection
    dataset:
      type: bdd100k
      name: BDD 100K
    metrics:
    - type: mAP@50
      value: 0.252
    - type: hallucination_reduction_near_ood
      name: Hallucination Reduction (Near-OoD)
      value: "84.8%"
---

# M-Hood: Models for Mitigating Hallucinations in Object Detection

[![Paper](https://img.shields.io/badge/Paper-PDF-red)](https://arxiv.org/pdf/2503.07330) <!-- Replace with actual paper link when available -->
[![Code](https://img.shields.io/badge/Code-GitLab-orange)](https://gricad-gitlab.univ-grenoble-alpes.fr/dnn-safety/m-hood)
[![Dataset](https://img.shields.io/badge/Dataset-HuggingFace-blue)](https://huggingface.co/datasets/HugoHE/m-hood-dataset)

This repository contains the official models from the paper **"Mitigating Hallucinations in Object Detection Models: A Revisit to Out-of-Distribution Detection"**. It provides a collection of YOLOv10 and Faster R-CNN models, including both standard ("vanilla") baselines and our novel **fine-tuned models** designed to significantly reduce false positive detections on out-of-distribution (OoD) data.

Our work first identifies critical flaws in existing OoD benchmarks and then introduces a novel fine-tuning strategy that teaches detectors to be less overconfident. By training on specially curated **"proximal OoD"** samples, our models learn to suppress objectness scores for unfamiliar objects, leading to an **88% reduction in overall hallucination error** on the BDD-100K benchmark when combined with an OoD filter.

## 🎯 Key Features

- **Novel Fine-Tuning for Hallucination Mitigation**: Models specifically fine-tuned to reduce false positives on novel objects, enhancing safety and reliability.
- **Vanilla vs. Fine-tuned Comparison**: Provides both baseline and improved models to clearly demonstrate the effectiveness of our approach.
- **Dual Architecture Support**: Includes both the real-time **YOLOv10** and the high-accuracy **Faster R-CNN**.
- **Multi-Dataset Scope**: Models trained on **BDD 100K** for autonomous driving and **Pascal VOC** for general object detection.
- **Corrected Benchmark Datasets**: Accompanied by new, carefully curated OoD test sets (**Near-OoD**, **Far-OoD**) that address the flaws in previous benchmarks.

## 🔬 The M-Hood Approach: How It Works

Object detectors often "hallucinate" and produce high-confidence false predictions when faced with novel objects not seen during training. Our research tackles this in two ways:

1.  **Benchmarking Revisited**: We found existing OoD test sets were flawed, with ~13% of supposedly "OoD" images containing in-distribution (ID) objects. This leads to inaccurate performance evaluation. We created new, clean OoD benchmarks for reliable assessment.

2.  **Proximal OoD Fine-Tuning**: Our core contribution is a fine-tuning strategy that makes the detector itself more robust.
    - We create a dataset of **"proximal OoD"** samples—objects that are semantically similar to the training classes but are not part of them (e.g., using 'deer' as a proximal OoD sample for a model trained on 'horse' and 'cow').
    - We fine-tune the original models on a combined dataset of original ID data and these new proximal OoD samples.
    - During this process, the proximal OoD samples are treated as **background**, effectively teaching the model to suppress its predictions for these and other similar novel objects.

The result is a model that is inherently more conservative and less likely to hallucinate, without significantly compromising its performance on the original task.

## 📊 Performance Highlights

Our fine-tuning method drastically reduces the number of hallucinated detections on OoD datasets. The tables below show the number of false predictions (lower is better) on our **Near-OoD** benchmark.

#### YOLOv10 on BDD-100K (Near-OoD Hallucination Counts)

| Model Configuration | Hallucination Count | Reduction |
|---------------------|---------------------|-----------|
| Original (Vanilla)  | 708                 | -         |
| **Ours (Fine-tuned)** | **145**             | **-79.5%**|
| Original + KNN Filter | 297                 | -58.1%    |
| **Ours + KNN Filter** | **78**              | **-89.0%**|

#### Faster R-CNN on BDD-100K (Near-OoD Hallucination Counts)

| Model Configuration | Hallucination Count | Reduction |
|---------------------|---------------------|-----------|
| Original (Vanilla)  | 2,595               | -         |
| **Ours (Fine-tuned)** | **395**             | **-84.8%**|
| Original + KNN Filter | 1,272               | -51.0%    |
| **Ours + KNN Filter** | **270**             | **-89.6%**|

## 🗂️ Model Collection

### YOLOv10 Models

| Model | Dataset | Training Type | Size | Description | Download |
|-------|---------|---------------|------|-------------|----------|
| **yolov10-bdd-vanilla.pt** | BDD 100K | Vanilla | 62MB | Baseline real-time model for autonomous driving. | [Download](./yolov10-bdd-vanilla.pt) |
| **yolov10-bdd-finetune.pt** | BDD 100K | **Fine-tuned** | 62MB | **Robust** model with reduced OoD hallucinations. | [Download](./yolov10-bdd-finetune.pt) |
| **yolov10-voc-vanilla.pt** | Pascal VOC | Vanilla | 63MB | Baseline model for general purpose object detection. | [Download](./yolov10-voc-vanilla.pt) |
| **yolov10-voc-finetune.pt** | Pascal VOC | **Fine-tuned** | 94MB | **Robust** general purpose model for OoD scenarios. | [Download](./yolov10-voc-finetune.pt) |

### Faster R-CNN Models

| Model | Dataset | Training Type | Size | Description | Download |
|-------|---------|---------------|------|-------------|----------|
| **faster-rcnn-bdd-vanilla.pth** | BDD 100K | Vanilla | 315MB | High-accuracy baseline for autonomous driving. | [Download](./faster-rcnn-bdd-vanilla.pth) |
| **faster-rcnn-bdd-finetune.pth**| BDD 100K | **Fine-tuned** | 158MB | **Robust** high-accuracy model for OoD scenarios. | [Download](./faster-rcnn-bdd-finetune.pth) |
| **faster-rcnn-voc-vanilla.pth** | Pascal VOC | Vanilla | 315MB | High-accuracy baseline for general object detection. | [Download](./faster-rcnn-voc-vanilla.pth) |
| **faster-rcnn-voc-finetune.pth**| Pascal VOC | **Fine-tuned** | 158MB | **Robust** high-accuracy general purpose model. | [Download](./faster-rcnn-voc-finetune.pth) |

*(Note: The KITTI models mentioned in the old README are not explicitly detailed in the provided paper text, so I have focused on BDD and VOC, which are central to the experiments shown.)*

## 🚀 Quick Start

### YOLOv10 Usage

```python
from ultralytics import YOLO

# Load our robust, fine-tuned YOLOv10 model
model = YOLO('yolov10-bdd-finetune.pt')

# Run inference
results = model('path/to/your/image.jpg')

# Process results
for result in results:
    boxes = result.boxes.xyxy   # bounding boxes
    scores = result.boxes.conf  # confidence scores
    classes = result.boxes.cls  # class predictions
```

### Faster R-CNN Usage

```python
import torch
import torchvision
from torchvision.models.detection import fasterrcnn_resnet50_fpn

# NOTE: The provided .pth files are state_dicts.
# You need to load them into a model instance.
# Example for a vanilla VOC model:
num_classes = 21 # 20 classes + background
model = fasterrcnn_resnet50_fpn(weights=None, num_classes=num_classes)
model.load_state_dict(torch.load('faster-rcnn-voc-vanilla.pth'))
model.eval()

# Run inference on a pre-processed image tensor
with torch.no_grad():
    predictions = model(image_tensor)

# Process results
boxes = predictions[0]['boxes']
scores = predictions[0]['scores']
labels = predictions[0]['labels']
```

## 📄 Citation

If you use our models, datasets, or methodology in your research, please cite our paper:

```bibtex
@inproceedings{he2025mitigating,
  title={{Mitigating Hallucinations in Object Detection Models: A Revisit to Out-of-Distribution Detection}},
  author={He, Weicheng and Wu, Changshun and Cheng, Chih-Hong and Huang, Xiaowei and Bensalem, Saddek},
  booktitle={To Be Published},
  year={2025}
}
```

Please also consider citing the original works for the model architectures and datasets used.

## 📜 License

This work is released under the MIT License.