lorebianchi98 commited on
Commit
898ffd0
·
verified ·
1 Parent(s): 766440d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -0
README.md ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - google/owlv2-large-patch14
5
+ pipeline_tag: object-detection
6
+ ---
7
+ # **NoctOWL: Fine-Grained Open-Vocabulary Object Detector**
8
+
9
+
10
+ ## **Model Description**
11
+
12
+ **NoctOWL** (***N***ot **o**nly **c**oarse-**t**ext **OWL**) is an adaptation of **OWL-ViT** (*NoctOWL*) and **OWLv2** (*NoctOWLv2*), designed for **Fine-Grained Open-Vocabulary Detection (FG-OVD)**. Unlike standard open-vocabulary object detectors, which focus primarily on class-level recognition, NoctOWL enhances the ability to detect and distinguish fine-grained object attributes such as color, material, transparency, and pattern.
13
+
14
+ It maintains a balanced **trade-off between fine- and coarse-grained detection**, making it particularly effective in scenarios requiring detailed object descriptions.
15
+
16
+ You can find the original code to train and evaluate the model [here](https://github.com/lorebianchi98/FG-OVD/tree/main/benchmarks).
17
+
18
+ ### **Model Variants**
19
+ - **NoctOWL Base** (`lorebianchi98/NoctOWL-base-patch16`)
20
+ - **NoctOWLv2 Base** (`lorebianchi98/NoctOWLv2-base-patch16`)
21
+ - **NoctOWL Large** (`lorebianchi98/NoctOWL-large-patch14`)
22
+ - **NoctOWLv2 Large** (`lorebianchi98/NoctOWLv2-large-patch14`)
23
+
24
+ ## **Usage**
25
+
26
+ ### **Loading the Model**
27
+ ```python
28
+ from transformers import OwlViTForObjectDetection, Owlv2ForObjectDetection, OwlViTProcessor, Owlv2Processor
29
+
30
+ # Load NoctOWL model
31
+ model = OwlViTForObjectDetection.from_pretrained("lorebianchi98/NoctOWL-base-patch16")
32
+ processor = OwlViTProcessor.from_pretrained("google/owlvit-base-patch16")
33
+
34
+ # Load NoctOWLv2 model
35
+ model_v2 = Owlv2ForObjectDetection.from_pretrained("lorebianchi98/NoctOWLv2-base-patch16")
36
+ processor_v2 = Owlv2Processor.from_pretrained("google/owlv2-base-patch16")
37
+ ```
38
+
39
+ ### **Inference Example**
40
+ ```python
41
+ from PIL import Image
42
+ import torch
43
+
44
+ # Load image
45
+ image = Image.open("example.jpg")
46
+
47
+ # Define text prompts (fine-grained descriptions)
48
+ text_queries = ["a red patterned dress", "a dark brown wooden chair"]
49
+
50
+ # Process inputs
51
+ inputs = processor(images=image, text=text_queries, return_tensors="pt")
52
+
53
+ # Run inference
54
+ outputs = model(**inputs)
55
+
56
+ # Extract detected objects
57
+ logits = outputs.logits
58
+ boxes = outputs.pred_boxes
59
+
60
+ # Post-processing can be applied to visualize results
61
+ ```
62
+
63
+ ## Results
64
+ We report the mean Average Precision (**mAP**) on the Fine-Grained Open-Vocabulary Detection ([FG-OVD](https://lorebianchi98.github.io/FG-OVD/)) benchmarks across different difficulty levels, as well as performance on rare classes from the LVIS dataset.
65
+ | Model | LVIS (Rare) | Trivial | Easy | Medium | Hard | Color | Material | Pattern | Transparency |
66
+ |-------|------------|----------------|---------------|---------------|---------------|-------|----------|---------|--------------|
67
+ | OWL (B/16) | 20.6 | 53.9 | 38.4 | 39.8 | 26.2 | 45.3 | 37.3 | 26.6 | 34.1 |
68
+ | OWL (L/14) | 31.2 | 65.1 | 44.0 | 39.3 | 26.5 | 43.8 | 44.9 | 36.0 | 29.2 |
69
+ | OWLv2 (B/16) | 29.6 | 52.9 | 40.0 | 38.5 | 25.3 | 45.1 | 33.5 | 19.2 | 28.5 |
70
+ | OWLv2 (L/14) | **34.9** | 63.2 | 42.8 | 41.2 | 25.4 | 53.3 | 36.9 | 23.3 | 12.2 |
71
+ | **NoctOWL (B/16)** | 11.6 | 46.6 | 44.4 | 45.6 | 40.0 | 44.7 | 46.0 | 46.1 | 53.6 |
72
+ | **NoctOWL (L/14)** | 26.0 | 57.4 | 54.2 | 54.8 | 48.6 | 53.1 | 56.9 | **49.8** | **57.2** |
73
+ | **NoctOWLv2 (B/16)** | 17.5 | 48.3 | 49.1 | 47.1 | 42.1 | 46.8 | 48.2 | 42.2 | 50.2 |
74
+ | **NoctOWLv2 (L/14)** | 27.2 | **57.5** | **55.5** | **57.2** | **50.2** | **55.6** | **57.0** | 49.2 | 55.9 |