M-Hood: Models for Mitigating Hallucinations in Object Detection

This repository contains the official models from the paper "Mitigating Hallucinations in Object Detection Models: A Revisit to Out-of-Distribution Detection". It provides a collection of YOLOv10 and Faster R-CNN models, including both standard ("vanilla") baselines and our novel fine-tuned models designed to significantly reduce false positive detections on out-of-distribution (OoD) data.

Our work first identifies critical flaws in existing OoD benchmarks and then introduces a novel fine-tuning strategy that teaches detectors to be less overconfident. By training on specially curated "proximal OoD" samples, our models learn to suppress objectness scores for unfamiliar objects, leading to an 88% reduction in overall hallucination error on the BDD-100K benchmark when combined with an OoD filter.

🎯 Key Features

Novel Fine-Tuning for Hallucination Mitigation: Models specifically fine-tuned to reduce false positives on novel objects, enhancing safety and reliability.
Vanilla vs. Fine-tuned Comparison: Provides both baseline and improved models to clearly demonstrate the effectiveness of our approach.
Dual Architecture Support: Includes both the real-time YOLOv10 and the high-accuracy Faster R-CNN.
Multi-Dataset Scope: Models trained on BDD 100K for autonomous driving and Pascal VOC for general object detection.
Corrected Benchmark Datasets: Accompanied by new, carefully curated OoD test sets (Near-OoD, Far-OoD) that address the flaws in previous benchmarks.

🔬 The M-Hood Approach: How It Works

Object detectors often "hallucinate" and produce high-confidence false predictions when faced with novel objects not seen during training. Our research tackles this in two ways:

Benchmarking Revisited: We found existing OoD test sets were flawed, with ~13% of supposedly "OoD" images containing in-distribution (ID) objects. This leads to inaccurate performance evaluation. We created new, clean OoD benchmarks for reliable assessment.
Proximal OoD Fine-Tuning: Our core contribution is a fine-tuning strategy that makes the detector itself more robust.
- We create a dataset of "proximal OoD" samples—objects that are semantically similar to the training classes but are not part of them (e.g., using 'deer' as a proximal OoD sample for a model trained on 'horse' and 'cow').
- We fine-tune the original models on a combined dataset of original ID data and these new proximal OoD samples.
- During this process, the proximal OoD samples are treated as background, effectively teaching the model to suppress its predictions for these and other similar novel objects.

The result is a model that is inherently more conservative and less likely to hallucinate, without significantly compromising its performance on the original task.

📊 Performance Highlights

Our fine-tuning method drastically reduces the number of hallucinated detections on OoD datasets. The tables below show the number of false predictions (lower is better) on our Near-OoD benchmark.

YOLOv10 on BDD-100K (Near-OoD Hallucination Counts)

Model Configuration	Hallucination Count	Reduction
Original (Vanilla)	708	-
Ours (Fine-tuned)	145	-79.5%
Original + KNN Filter	297	-58.1%
Ours + KNN Filter	78	-89.0%

Faster R-CNN on BDD-100K (Near-OoD Hallucination Counts)

Model Configuration	Hallucination Count	Reduction
Original (Vanilla)	2,595	-
Ours (Fine-tuned)	395	-84.8%
Original + KNN Filter	1,272	-51.0%
Ours + KNN Filter	270	-89.6%

🗂️ Model Collection

YOLOv10 Models

Model	Dataset	Training Type	Size	Description	Download
yolov10-bdd-vanilla.pt	BDD 100K	Vanilla	62MB	Baseline real-time model for autonomous driving.	Download
yolov10-bdd-finetune.pt	BDD 100K	Fine-tuned	62MB	Robust model with reduced OoD hallucinations.	Download
yolov10-voc-vanilla.pt	Pascal VOC	Vanilla	63MB	Baseline model for general purpose object detection.	Download
yolov10-voc-finetune.pt	Pascal VOC	Fine-tuned	94MB	Robust general purpose model for OoD scenarios.	Download

Faster R-CNN Models

Model	Dataset	Training Type	Size	Description	Download
faster-rcnn-bdd-vanilla.pth	BDD 100K	Vanilla	315MB	High-accuracy baseline for autonomous driving.	Download
faster-rcnn-bdd-finetune.pth	BDD 100K	Fine-tuned	158MB	Robust high-accuracy model for OoD scenarios.	Download
faster-rcnn-voc-vanilla.pth	Pascal VOC	Vanilla	315MB	High-accuracy baseline for general object detection.	Download
faster-rcnn-voc-finetune.pth	Pascal VOC	Fine-tuned	158MB	Robust high-accuracy general purpose model.	Download

(Note: The KITTI models mentioned in the old README are not explicitly detailed in the provided paper text, so I have focused on BDD and VOC, which are central to the experiments shown.)

🚀 Quick Start

YOLOv10 Usage

from ultralytics import YOLO

# Load our robust, fine-tuned YOLOv10 model
model = YOLO('yolov10-bdd-finetune.pt')

# Run inference
results = model('path/to/your/image.jpg')

# Process results
for result in results:
    boxes = result.boxes.xyxy   # bounding boxes
    scores = result.boxes.conf  # confidence scores
    classes = result.boxes.cls  # class predictions

Faster R-CNN Usage

import torch
import torchvision
from torchvision.models.detection import fasterrcnn_resnet50_fpn

# NOTE: The provided .pth files are state_dicts.
# You need to load them into a model instance.
# Example for a vanilla VOC model:
num_classes = 21 # 20 classes + background
model = fasterrcnn_resnet50_fpn(weights=None, num_classes=num_classes)
model.load_state_dict(torch.load('faster-rcnn-voc-vanilla.pth'))
model.eval()

# Run inference on a pre-processed image tensor
with torch.no_grad():
    predictions = model(image_tensor)

# Process results
boxes = predictions[0]['boxes']
scores = predictions[0]['scores']
labels = predictions[0]['labels']

📄 Citation

If you use our models, datasets, or methodology in your research, please cite our paper:

@inproceedings{he2025mitigating,
  title={{Mitigating Hallucinations in Object Detection Models: A Revisit to Out-of-Distribution Detection}},
  author={He, Weicheng and Wu, Changshun and Cheng, Chih-Hong and Huang, Xiaowei and Bensalem, Saddek},
  booktitle={To Be Published},
  year={2025}
}

Please also consider citing the original works for the model architectures and datasets used.

📜 License

This work is released under the MIT License.

HugoHE
/

m-hood