Conditional-DETR ResNet-50 - Handwritten Signature Detection

This repository presents a Conditional-DETR model with ResNet-50 backbone, fine-tuned to detect handwritten signatures in document images. This model achieved the highest [email protected] (93.65%) among all tested architectures in our comprehensive evaluation.

Resource Links / Badges Details
Article Paper page A detailed community article covering the full development process of the project
Model Files (YOLOv8s) HF Model Available formats: PyTorch ONNX TensorRT
Dataset – Original Roboflow 2,819 document images annotated with signature coordinates
Dataset – Processed HF Dataset Augmented and pre-processed version (640px) for model training
Notebooks – Model Experiments Colab W&B Training Complete training and evaluation pipeline with selection among different architectures (yolo, detr, rt-detr, conditional-detr, yolos)
Notebooks – HP Tuning Colab W&B HP Tuning Optuna trials for optimizing the precision/recall balance
Inference Server GitHub Complete deployment and inference pipeline with Triton Inference Server
OpenVINO Docker Triton
Live Demo HF Space Graphical interface with real-time inference
Gradio Plotly

Dataset

Dataset on HF
The training utilized a dataset built from two public datasets: [Tobacco800](https://paperswithcode.com/dataset/tobacco-800) and [signatures-xc8up](https://universe.roboflow.com/roboflow-100/signatures-xc8up), unified and processed in [Roboflow](https://roboflow.com/).

Dataset Summary:

  • Training: 1,980 images (70%)
  • Validation: 420 images (15%)
  • Testing: 419 images (15%)
  • Format: COCO JSON
  • Resolution: 640x640 pixels

Roboflow Dataset


Training Process

The training process involved the following steps:

1. Model Selection:

Various object detection models were evaluated to identify the best balance between precision, recall, and inference time.

Metric rtdetr-l yolos-base yolos-tiny conditional-detr-resnet-50 detr-resnet-50 yolov8x yolov8l yolov8m yolov8s yolov8n yolo11x yolo11l yolo11m yolo11s yolo11n yolov10x yolov10l yolov10b yolov10m yolov10s yolov10n
Inference Time - CPU (ms) 583.608 1706.49 265.346 476.831 425.649 1259.47 871.329 401.183 216.6 110.442 1016.68 518.147 381.652 179.792 106.656 821.183 580.767 473.109 320.12 150.076 73.8596
mAP50 0.92709 0.901154 0.869814 0.936524 0.88885 0.794237 0.800312 0.875322 0.874721 0.816089 0.667074 0.707409 0.809557 0.835605 0.813799 0.681023 0.726802 0.789835 0.787688 0.663877 0.734332
mAP50-95 0.622364 0.583569 0.469064 0.653321 0.579428 0.552919 0.593976 0.665495 0.65457 0.623963 0.482289 0.499126 0.600797 0.638849 0.617496 0.474535 0.522654 0.578874 0.581259 0.473857 0.552704

Model Selection

Highlights:

  • Best mAP50: conditional-detr-resnet-50 (0.936524)
  • Best mAP50-95: yolov8m (0.665495)
  • Fastest Inference Time: yolov10n (73.8596 ms)

Detailed experiments are available on Weights & Biases.

2. Hyperparameter Tuning:

The YOLOv8s model, which demonstrated a good balance of inference time, precision, and recall, was selected for hyperparameter tuning.

Optuna was used for 20 optimization trials. The hyperparameter tuning used the following parameter configuration:

    dropout = trial.suggest_float("dropout", 0.0, 0.5, step=0.1)
    lr0 = trial.suggest_float("lr0", 1e-5, 1e-1, log=True)
    box = trial.suggest_float("box", 3.0, 7.0, step=1.0)
    cls = trial.suggest_float("cls", 0.5, 1.5, step=0.2)
    opt = trial.suggest_categorical("optimizer", ["AdamW", "RMSProp"])

Results can be visualized here: Hypertuning Experiment.

Hypertuning Sweep

3. Evaluation:

The models were evaluated on the test set at the end of training in ONNX (CPU) and TensorRT (GPU - T4) formats. Performance metrics included precision, recall, mAP50, and mAP50-95.

Trials

Results Comparison:

Metric Base Model Best Trial (#10) Difference
mAP50 87.47% 95.75% +8.28%
mAP50-95 65.46% 66.26% +0.81%
Precision 97.23% 95.61% -1.63%
Recall 76.16% 91.21% +15.05%
F1-score 85.42% 93.36% +7.94%

Results

After hyperparameter tuning of the YOLOv8s model, the best model achieved the following results on the test set:

  • Precision: 94.74%
  • Recall: 89.72%
  • mAP@50: 94.50%
  • mAP@50-95: 67.35%
  • Inference Time:
    • ONNX Runtime (CPU): 171.56 ms
    • TensorRT (GPU - T4): 7.657 ms

How to Use

Installation

pip install transformers torch torchvision pillow

Inference

from transformers import AutoImageProcessor, AutoModelForObjectDetection
from PIL import Image
import torch

# Load model and processor
model_name = "tech4humans/conditional-detr-50-signature-detector"
processor = AutoImageProcessor.from_pretrained(model_name)
model = AutoModelForObjectDetection.from_pretrained(model_name)

# Load and process image
image = Image.open("path/to/your/document.jpg")
inputs = processor(images=image, return_tensors="pt")

# Run inference
with torch.no_grad():
    outputs = model(**inputs)

# Post-process results
target_sizes = torch.tensor([image.size[::-1]])
results = processor.post_process_object_detection(
    outputs, target_sizes=target_sizes, threshold=0.5
)[0]

# Extract detections
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
    box = [round(i, 2) for i in box.tolist()]
    print(f"Detected signature with confidence {round(score.item(), 3)} at location {box}")

Visualization

import matplotlib.pyplot as plt
import matplotlib.patches as patches
from PIL import Image

def visualize_predictions(image_path, results, threshold=0.5):
    image = Image.open(image_path)
    fig, ax = plt.subplots(1, figsize=(12, 9))
    ax.imshow(image)
    
    for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
        if score > threshold:
            x, y, x2, y2 = box.tolist()
            width, height = x2 - x, y2 - y
            
            rect = patches.Rectangle(
                (x, y), width, height, 
                linewidth=2, edgecolor='red', facecolor='none'
            )
            ax.add_patch(rect)
            ax.text(x, y-10, f'Signature: {score:.3f}', 
                   bbox=dict(boxstyle="round,pad=0.3", facecolor="yellow", alpha=0.7))
    
    ax.set_title("Signature Detection Results")
    plt.axis('off')
    plt.show()

# Use the visualization
visualize_predictions("path/to/your/document.jpg", results)

Demo

You can explore the model and test real-time inference in the Hugging Face Spaces demo, built with Gradio and ONNXRuntime.

Open in Spaces


πŸ”— Inference with Triton Server

If you want to deploy this signature detection model in a production environment, check out our inference server repository based on the NVIDIA Triton Inference Server.

Triton Badge GitHub Badge

Infrastructure

Software

The model was trained and tuned using a Jupyter Notebook environment.

  • Operating System: Ubuntu 22.04
  • Python: 3.10.12
  • PyTorch: 2.5.1+cu121
  • Ultralytics: 8.3.58
  • Roboflow: 1.1.50
  • Optuna: 4.1.0
  • ONNX Runtime: 1.20.1
  • TensorRT: 10.7.0

Hardware

Training was performed on a Google Cloud Platform n1-standard-8 instance with the following specifications:

  • CPU: 8 vCPUs
  • GPU: NVIDIA Tesla T4

License

Model Weights, Code and Training Materials – Apache 2.0

  • License: Apache License 2.0
  • Usage: All training scripts, deployment code, and usage instructions are licensed under the Apache 2.0 license.

Contact and Information

For further information, questions, or contributions, contact us at [email protected].

πŸ“§ Email: [email protected]
🌐 Website: www.tech4.ai
πŸ’Ό LinkedIn: Tech4Humans

Author

Samuel Lima

Samuel Lima

AI Research Engineer

HuggingFace

Responsibilities in this Project

  • πŸ”¬ Model development and training
  • πŸ“Š Dataset analysis and processing
  • βš™οΈ Hyperparameter optimization and performance evaluation
  • πŸ“ Technical documentation and model card

Developed with πŸ’œ by Tech4Humans

Downloads last month
0
Safetensors
Model size
43.5M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for tech4humans/conditional-detr-50-signature-detector

Finetuned
(75)
this model

Dataset used to train tech4humans/conditional-detr-50-signature-detector

Collection including tech4humans/conditional-detr-50-signature-detector

Evaluation results