NoctOWL: Fine-Grained Open-Vocabulary Object Detector
Model Description
NoctOWL (Not only coarse-text OWL) is an adaptation of OWL-ViT (NoctOWL) and OWLv2 (NoctOWLv2), designed for Fine-Grained Open-Vocabulary Detection (FG-OVD). Unlike standard open-vocabulary object detectors, which focus primarily on class-level recognition, NoctOWL enhances the ability to detect and distinguish fine-grained object attributes such as color, material, transparency, and pattern.
It maintains a balanced trade-off between fine- and coarse-grained detection, making it particularly effective in scenarios requiring detailed object descriptions.
You can find the original code to train and evaluate the model here.
Model Variants
- NoctOWL Base (
lorebianchi98/NoctOWL-base-patch16
) - NoctOWLv2 Base (
lorebianchi98/NoctOWLv2-base-patch16
) - NoctOWL Large (
lorebianchi98/NoctOWL-large-patch14
) - NoctOWLv2 Large (
lorebianchi98/NoctOWLv2-large-patch14
)
Usage
Loading the Model
from transformers import OwlViTForObjectDetection, Owlv2ForObjectDetection, OwlViTProcessor, Owlv2Processor
# Load NoctOWL model
model = OwlViTForObjectDetection.from_pretrained("lorebianchi98/NoctOWL-base-patch16")
processor = OwlViTProcessor.from_pretrained("google/owlvit-base-patch16")
# Load NoctOWLv2 model
model_v2 = Owlv2ForObjectDetection.from_pretrained("lorebianchi98/NoctOWLv2-base-patch16")
processor_v2 = Owlv2Processor.from_pretrained("google/owlv2-base-patch16")
Inference Example
from PIL import Image
import torch
# Load image
image = Image.open("example.jpg")
# Define text prompts (fine-grained descriptions)
text_queries = ["a red patterned dress", "a dark brown wooden chair"]
# Process inputs
inputs = processor(images=image, text=text_queries, return_tensors="pt")
# Run inference
outputs = model(**inputs)
# Extract detected objects
logits = outputs.logits
boxes = outputs.pred_boxes
# Post-processing can be applied to visualize results
Results
We report the mean Average Precision (mAP) on the Fine-Grained Open-Vocabulary Detection (FG-OVD) benchmarks across different difficulty levels, as well as performance on rare classes from the LVIS dataset.
Model | LVIS (Rare) | Trivial | Easy | Medium | Hard | Color | Material | Pattern | Transparency |
---|---|---|---|---|---|---|---|---|---|
OWL (B/16) | 20.6 | 53.9 | 38.4 | 39.8 | 26.2 | 45.3 | 37.3 | 26.6 | 34.1 |
OWL (L/14) | 31.2 | 65.1 | 44.0 | 39.3 | 26.5 | 43.8 | 44.9 | 36.0 | 29.2 |
OWLv2 (B/16) | 29.6 | 52.9 | 40.0 | 38.5 | 25.3 | 45.1 | 33.5 | 19.2 | 28.5 |
OWLv2 (L/14) | 34.9 | 63.2 | 42.8 | 41.2 | 25.4 | 53.3 | 36.9 | 23.3 | 12.2 |
NoctOWL (B/16) | 11.6 | 46.6 | 44.4 | 45.6 | 40.0 | 44.7 | 46.0 | 46.1 | 53.6 |
NoctOWL (L/14) | 26.0 | 57.4 | 54.2 | 54.8 | 48.6 | 53.1 | 56.9 | 49.8 | 57.2 |
NoctOWLv2 (B/16) | 17.5 | 48.3 | 49.1 | 47.1 | 42.1 | 46.8 | 48.2 | 42.2 | 50.2 |
NoctOWLv2 (L/14) | 27.2 | 57.5 | 55.5 | 57.2 | 50.2 | 55.6 | 57.0 | 49.2 | 55.9 |
- Downloads last month
- 13
Model tree for lorebianchi98/NoctOWLv2-large-patch14
Base model
google/owlv2-large-patch14