D-FINE

D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement is a family of real-time object detectors that improve localization accuracy by rethinking how bounding boxes are predicted in DETR-style models. Instead of directly regressing box coordinates, D-FINE introduces a distribution based refinement approach that progressively sharpens predictions over multiple stages.

It also includes a self-distillation mechanism that passes refined localization knowledge to earlier layers, improving training efficiency and model robustness. Combined with lightweight architectural optimizations, D-FINE achieves a strong balance between speed and accuracy.

This repository provides five pretrained variants — Nano, Small, Medium, Large, and Extra Large — offering a trade-off between speed and accuracy for different deployment needs.

Sample Predictions Across D-FINE Variants

Nano	Small
Medium	Large

Try it in the Browser

You can test the model(s) using our interactive Gradio demo:

D-FINE Variants

The D-FINE family includes five model sizes trained on the L&A Pucks Dataset, each offering a different balance between model size and detection accuracy.

Variant	Parameters	mAP@[0.50:0.95]
Nano	3.76M	0.825
Small	10.3M	0.816
Medium	19.6M	0.840
Large	31.2M	0.828
Extra Large	62.7M	0.803

mAP values are evaluated on the validation set of the L&A Pucks Dataset.

Installation

pip install -r requirements.txt

Tip: Use a virtual environment (venv or conda) to avoid dependency conflicts.

Quick start on L&A Pucks Dataset

from datasets import load_dataset
from transformers import AutoProcessor, AutoModel
from PIL import ImageDraw, ImageFont

# Load the validation split (or 'train')
ds = load_dataset("Laudando-Associates-LLC/pucks", split="test")

# Access the first example
image = ds[1]["image"]

# Load processor and model
processor = AutoProcessor.from_pretrained("Laudando-Associates-LLC/d-fine", trust_remote_code=True)
model = AutoModel.from_pretrained(f"Laudando-Associates-LLC/d-fine-nano", trust_remote_code=True)

# Process the image, reize and pad
inputs = processor(image)

# Run inference
outputs = model(**inputs, conf_threshold=0.4)

# Draw boxes
draw = ImageDraw.Draw(image)
font = ImageFont.truetype("DejaVuSans-Bold.ttf", size=24)
for result in outputs:
    boxes = result["boxes"]
    labels = result["labels"]
    scores = result["scores"]

    for box, label, score in zip(boxes, labels, scores):
        x1, y1, x2, y2 = box.tolist()
        draw.rectangle([x1, y1, x2, y2], outline="blue", width=5)
        draw.text((x1, max(0, y1 - 25)), f"{score:.2f}", fill="blue", font=font)

# Save result
image.save("output.jpg")

How to Use

The D-FINE model family uses a shared processor and variant-specific models. All components are compatible with Hugging Face's transformers library via trust_remote_code=True.

Step 1: Load the Preprocessor

The preprocessor is common to all D-FINE variants and handles resizing and padding.

from transformers import AutoProcessor

# Load the shared D-FINE processor
processor = AutoProcessor.from_pretrained("Laudando-Associates-LLC/d-fine", trust_remote_code=True)

Step 2: Load a D-FINE model variant

You can choose from any of the five variants: Nano, Small, Medium, Large, or Extra Large.

from transformers import AutoModel

model_variant = "nano" # small, medium, large, xlarge

# Load the D-FINE model variant
model = AutoModel.from_pretrained(f"Laudando-Associates-LLC/d-fine-{model_variant}", trust_remote_code=True)

Step 3: Run Inference

Using Pillow with a single or batch images:

from PIL import Image

# Single image
image = Image.open("your_image.jpg").convert("RGB")
inputs = processor(image)

# Batch of images
batch_images = [
    Image.open("image1.jpg").convert("RGB"),
    Image.open("image2.jpg").convert("RGB")
]
inputs = processor(batch_images)

# Run inference
outputs = model(**inputs, conf_threshold=0.4)

for result in outputs:
  boxes  = result["boxes"]   # [N, 4] bounding boxes (x1, y1, x2, y2)
  labels = result["labels"]  # [N] class indices
  scores = result["scores"]  # [N] confidence scores

Using OpenCV with a single or batch images:

import cv2

# Single OpenCV image (BGR)
image = cv2.imread("your_image.jpg")
inputs = processor(image)

# Batch of OpenCV images
batch_images = [
    cv2.imread("image1.jpg"),
    cv2.imread("image2.jpg")
]
inputs = processor(batch_images)

# Run inference
outputs = model(**inputs, conf_threshold=0.4)

for result in outputs:
  boxes  = result["boxes"]   # [N, 4] bounding boxes (x1, y1, x2, y2)
  labels = result["labels"]  # [N] class indices
  scores = result["scores"]  # [N] confidence scores

License

The D-FINE models use Apache License 2.0. The L&A Pucks Dataset which the models have been trained on use L&Aser Dataset Replication License (Version 1.0).

Citation

If you use D-FINE or its methods in your work, please cite the following BibTeX entries:

@misc{peng2024dfine,
      title={D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement},
      author={Yansong Peng and Hebei Li and Peixi Wu and Yueyi Zhang and Xiaoyan Sun and Feng Wu},
      year={2024},
      eprint={2410.13842},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Laudando-Associates-LLC
/

d-fine

D-FINE

Sample Predictions Across D-FINE Variants

Try it in the Browser

D-FINE Variants

Installation

Quick start on L&A Pucks Dataset

How to Use

Step 1: Load the Preprocessor

Step 2: Load a D-FINE model variant

Step 3: Run Inference

License

Citation

Model tree for Laudando-Associates-LLC/d-fine

Dataset used to train Laudando-Associates-LLC/d-fine

Space using Laudando-Associates-LLC/d-fine 1

Collection including Laudando-Associates-LLC/d-fine

D-FINE