D-FINE
D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement is a family of real-time object detectors that improve localization accuracy by rethinking how bounding boxes are predicted in DETR-style models. Instead of directly regressing box coordinates, D-FINE introduces a distribution based refinement approach that progressively sharpens predictions over multiple stages.
It also includes a self-distillation mechanism that passes refined localization knowledge to earlier layers, improving training efficiency and model robustness. Combined with lightweight architectural optimizations, D-FINE achieves a strong balance between speed and accuracy.
This repository provides five pretrained variants — Nano, Small, Medium, Large, and Extra Large — offering a trade-off between speed and accuracy for different deployment needs.
Sample Predictions Across D-FINE Variants
![]() Nano |
![]() Small |
![]() Medium |
![]() Large |
Try it in the Browser
You can test the model(s) using our interactive Gradio demo:
D-FINE Variants
The D-FINE family includes five model sizes trained on the L&A Pucks Dataset, each offering a different balance between model size and detection accuracy.
Variant | Parameters | mAP@[0.50:0.95] | Model Card | ONNX | PyTorch |
---|---|---|---|---|---|
Nano | 3.76M | 0.825 | |||
Small | 10.3M | 0.816 | |||
Medium | 19.6M | 0.840 | |||
Large | 31.2M | 0.828 | |||
Extra Large | 62.7M | 0.803 |
mAP values are evaluated on the validation set of the L&A Pucks Dataset.
Installation
pip install -r requirements.txt
Tip: Use a virtual environment (venv or conda) to avoid dependency conflicts.
Quick start on L&A Pucks Dataset
from datasets import load_dataset
from transformers import AutoProcessor, AutoModel
from PIL import ImageDraw, ImageFont
# Load the validation split (or 'train')
ds = load_dataset("Laudando-Associates-LLC/pucks", split="test")
# Access the first example
image = ds[1]["image"]
# Load processor and model
processor = AutoProcessor.from_pretrained("Laudando-Associates-LLC/d-fine", trust_remote_code=True)
model = AutoModel.from_pretrained(f"Laudando-Associates-LLC/d-fine-nano", trust_remote_code=True)
# Process the image, reize and pad
inputs = processor(image)
# Run inference
outputs = model(**inputs, conf_threshold=0.4)
# Draw boxes
draw = ImageDraw.Draw(image)
font = ImageFont.truetype("DejaVuSans-Bold.ttf", size=24)
for result in outputs:
boxes = result["boxes"]
labels = result["labels"]
scores = result["scores"]
for box, label, score in zip(boxes, labels, scores):
x1, y1, x2, y2 = box.tolist()
draw.rectangle([x1, y1, x2, y2], outline="blue", width=5)
draw.text((x1, max(0, y1 - 25)), f"{score:.2f}", fill="blue", font=font)
# Save result
image.save("output.jpg")
How to Use
The D-FINE model family uses a shared processor and variant-specific models. All components are compatible with Hugging Face's transformers
library via trust_remote_code=True
.
Step 1: Load the Preprocessor
The preprocessor is common to all D-FINE variants and handles resizing and padding.
from transformers import AutoProcessor
# Load the shared D-FINE processor
processor = AutoProcessor.from_pretrained("Laudando-Associates-LLC/d-fine", trust_remote_code=True)
Step 2: Load a D-FINE model variant
You can choose from any of the five variants: Nano, Small, Medium, Large, or Extra Large.
from transformers import AutoModel
model_variant = "nano" # small, medium, large, xlarge
# Load the D-FINE model variant
model = AutoModel.from_pretrained(f"Laudando-Associates-LLC/d-fine-{model_variant}", trust_remote_code=True)
Step 3: Run Inference
Using Pillow with a single or batch images:
from PIL import Image
# Single image
image = Image.open("your_image.jpg").convert("RGB")
inputs = processor(image)
# Batch of images
batch_images = [
Image.open("image1.jpg").convert("RGB"),
Image.open("image2.jpg").convert("RGB")
]
inputs = processor(batch_images)
# Run inference
outputs = model(**inputs, conf_threshold=0.4)
for result in outputs:
boxes = result["boxes"] # [N, 4] bounding boxes (x1, y1, x2, y2)
labels = result["labels"] # [N] class indices
scores = result["scores"] # [N] confidence scores
Using OpenCV with a single or batch images:
import cv2
# Single OpenCV image (BGR)
image = cv2.imread("your_image.jpg")
inputs = processor(image)
# Batch of OpenCV images
batch_images = [
cv2.imread("image1.jpg"),
cv2.imread("image2.jpg")
]
inputs = processor(batch_images)
# Run inference
outputs = model(**inputs, conf_threshold=0.4)
for result in outputs:
boxes = result["boxes"] # [N, 4] bounding boxes (x1, y1, x2, y2)
labels = result["labels"] # [N] class indices
scores = result["scores"] # [N] confidence scores
License
The D-FINE models use Apache License 2.0. The L&A Pucks Dataset which the models have been trained on use L&Aser Dataset Replication License (Version 1.0).
Citation
If you use D-FINE
or its methods in your work, please cite the following BibTeX entries:
@misc{peng2024dfine,
title={D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement},
author={Yansong Peng and Hebei Li and Peixi Wu and Yueyi Zhang and Xiaoyan Sun and Feng Wu},
year={2024},
eprint={2410.13842},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
- Downloads last month
- 6