RT-DETRv2 Fine-tuned for Voucher Classification

This model is a fine-tuned version of PekingU/rtdetr_v2_r101vd for voucher classification and object detection.

Model Details

Model Description

  • Model Type: Object Detection (RT-DETRv2)
  • Base Model: PekingU/rtdetr_v2_r101vd
  • Task: Multi-class voucher classification and detection
  • Classes: 3 classes
    • 0: digital (digital invoices)
    • 1: fisico (physical receipts on blank pages)
    • 2: tesoreria (small on-site payment receipts)

Training Details

Training Dataset:

  • Total Samples: 663
  • Class Distribution:
  • fisico (id: 1): 441 samples (66.5%)
  • digital (id: 0): 177 samples (26.7%)
  • tesoreria (id: 2): 45 samples (6.8%)

Training Configuration:

  • Image Size: 832x832
  • Batch Size: 32
  • Learning Rate: 1e-05
  • Weight Decay: 0.01
  • Epochs: 80
  • Validation Split: 0.15

Data Processing:

  • Pre-augmented dataset used (no runtime augmentation)
  • External train/validation split (REQUIRED - use create_train_val_split.py)
  • Preprocessing: Resize + Normalization only

Performance Metrics

Metric Definitions:

  • mAP (mean Average Precision): Overall performance metric averaged across all classes and IoU thresholds (0.0-1.0, higher is better)
  • mAP@50: mAP calculated at IoU threshold 0.5 - more lenient, measures if objects are found in roughly correct location
  • mAP@75: mAP calculated at IoU threshold 0.75 - more strict, requires precise bounding box localization
  • IoU (Intersection over Union): Overlap between predicted and ground truth bounding boxes

Performance Ranges:

  • 0.9+: Excellent
  • 0.8-0.9: Very Good
  • 0.7-0.8: Good
  • 0.5-0.7: Fair
  • <0.5: Poor (needs improvement)

Final Evaluation Results:

Overall Detection Performance:

  • mAP: 0.0000
  • mAP@50: 0.0000
  • mAP@75: 0.0000

Per-Class Average Precision:

  • Digital invoices: 0.0000 (needs improvement)
  • Fisico receipts: 0.0000 (needs improvement)
  • Tesoreria receipts: 0.0000 (needs improvement)

Model Confidence:

  • Digital invoices mean confidence: 0.4218 (low)
  • Fisico receipts mean confidence: 0.3837 (low)
  • Tesoreria receipts mean confidence: 0.0000 (low)

Performance by Object Size:

  • Small objects: -1.0000
  • Medium objects: -1.0000
  • Large objects: 0.0000

Evaluation Dataset:

  • Digital invoices: 53 samples (27.5%)
  • Fisico receipts: 127 samples (65.8%)
  • Tesoreria receipts: 13 samples (6.7%)
  • Total evaluation samples: 193

Model Configuration:

  • Base model: PekingU/rtdetr_v2_r101vd
  • Architecture: rtdetr_v2_r101vd
  • Input resolution: 832ร—832 pixels
  • Training epochs: 80
  • Batch size: 32

Training Hardware:

  • GPU: NVIDIA H100 80GB HBM3
  • VRAM: 79.2 GB
  • RAM: 235.9 GB
  • GPU configuration: H100 optimized

Training Time: 27.0 minutes

Training Summary:

  • Final training loss: 10.7460
  • Final learning rate: 1.77e-11

MLflow Tracking

  • MLflow Run ID: 6b50f63a6e3144b7a719bbb2b15cb77a
  • MLflow Experiment: RT-DETRv2_Voucher_Classification

Usage

from transformers import AutoModelForObjectDetection, AutoImageProcessor
import torch
from PIL import Image
import numpy as np

# Load model and processor
model = AutoModelForObjectDetection.from_pretrained("jnmrr/rtdetr-v2-voucher-classifier")
image_processor = AutoImageProcessor.from_pretrained("jnmrr/rtdetr-v2-voucher-classifier")

# Load and preprocess image
image = Image.open("path/to/your/voucher.jpg").convert("RGB")
inputs = image_processor(images=image, return_tensors="pt")

# Run inference
with torch.no_grad():
    outputs = model(**inputs)

# Post-process results
target_sizes = torch.tensor([image.size[::-1]])  # (height, width)
results = image_processor.post_process_object_detection(
    outputs, 
    target_sizes=target_sizes, 
    threshold=0.5
)[0]

# Print predictions
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
    print(f"Class: {model.config.id2label[label.item()]}")
    print(f"Confidence: {score.item():.3f}")
    print(f"BBox: {box.tolist()}")

Training Procedure

The model was fine-tuned using the Hugging Face Transformers library with:

  • Pre-augmented dataset focusing on challenging cases
  • Format-specific augmentation strategies applied during data preparation
  • MLflow experiment tracking for reproducibility
  • External train/validation split REQUIRED for unbiased evaluation (no fallback to training data)

Limitations and Bias

  • Trained specifically on voucher/receipt images
  • Performance may vary on images significantly different from training distribution
  • Model optimized for 3-class voucher classification task

Citation

If you use this model, please cite:

@misc{rtdetr-v2-voucher-classifier,
  title={RT-DETRv2 Fine-tuned for Voucher Classification},
  author={Your Name},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/jnmrr/rtdetr-v2-voucher-classifier}
}
Downloads last month
18
Safetensors
Model size
76.6M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for jnmrr/rtdetr-v2-voucher-classifier

Finetuned
(5)
this model