Model Card for deformable_detr_boxref_coco_convnext_v2_tiny_imagenet21k
A Deformable DETR with box refinement object detection model with ConvNeXt v2 Tiny backbone (pre-trained on ImageNet-21k) and trained on COCO 2017 dataset.
Custom Kernels: This model uses optimized custom kernels for Soft-NMS and Deformable Attention operations. If you encounter compilation issues or prefer to use pure PyTorch implementations, set the environment variable DISABLE_CUSTOM_KERNELS=1
before loading the model.
Model Details
Model Type: Object detection
Model Stats:
- Params (M): 40.0
- Input image size: 640 x 640 (short side)
Dataset: COCO 2017 (91 classes)
Papers:
- Deformable DETR: Deformable Transformers for End-to-End Object Detection: https://arxiv.org/abs/2010.04159
- Soft-NMS -- Improving Object Detection With One Line of Code: https://arxiv.org/abs/1704.04503
- ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders: https://arxiv.org/abs/2301.00808
Metrics:
- mAP @ 512x512px: 41.04
- mAP @ 640px (short side) BS=1 (w/o masking): 45.75
- mAP @ 640px (short side) BS=2 (w. masking): 45.77
- mAP @ 800px (short side) BS=1 (w/o masking): 46.68
- mAP @ 800px (short side) BS=2 (w. masking): 46.63
Model Usage
Object Detection
import birder
from birder.inference.detection import infer_image
(net, model_info) = birder.load_pretrained_model("deformable_detr_boxref_coco_convnext_v2_tiny_imagenet21k", inference=True)
# Can also load model with Soft-NMS by passing custom_config={"soft_nms": True}
# Get the image size the model was trained on
size = birder.get_size_from_signature(model_info.signature)
# Create an inference transform
transform = birder.detection_transform(size, model_info.rgb_stats, dynamic_size=model_info.signature["dynamic"])
image = "path/to/image.jpeg" # or a PIL image, must be loaded in RGB format
detections = infer_image(net, image, transform)
# detections is a dict with keys: 'boxes', 'labels', 'scores'
# boxes: torch.Tensor with shape (N, 4) in [x1, y1, x2, y2] format
# labels: torch.Tensor with shape (N,) containing class indices
# scores: torch.Tensor with shape (N,) containing confidence scores
Citation
@misc{zhu2021deformabledetrdeformabletransformers,
title={Deformable DETR: Deformable Transformers for End-to-End Object Detection},
author={Xizhou Zhu and Weijie Su and Lewei Lu and Bin Li and Xiaogang Wang and Jifeng Dai},
year={2021},
eprint={2010.04159},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2010.04159},
}
@misc{bodla2017softnmsimprovingobject,
title={Soft-NMS -- Improving Object Detection With One Line of Code},
author={Navaneeth Bodla and Bharat Singh and Rama Chellappa and Larry S. Davis},
year={2017},
eprint={1704.04503},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/1704.04503},
}
@misc{woo2023convnextv2codesigningscaling,
title={ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders},
author={Sanghyun Woo and Shoubhik Debnath and Ronghang Hu and Xinlei Chen and Zhuang Liu and In So Kweon and Saining Xie},
year={2023},
eprint={2301.00808},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2301.00808},
}
- Downloads last month
- 112
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support