File size: 2,068 Bytes
a0635c6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a8f6806
a0635c6
 
 
 
 
 
 
0eeb340
 
8e1b521
 
 
 
 
 
 
 
 
 
 
 
 
 
a0635c6
 
8e1b521
 
 
a0635c6
 
8e1b521
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a0635c6
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
license: cc-by-nc-4.0
language:
- en
tags:
- vila
- nvila
- conversational
- multimodal
---

Dependency setups:

```bash
# other transformers version may also work, but we have not tested
pip install transformers==4.46 accelerate opencv-python torchvision einops pillow
pip install git+https://github.com/bfshi/scaling_on_scales.git
```

## Usage

```python
from transformers import AutoConfig, AutoModel
from termcolor import colored

model_path = "Efficient-Large-Model/NVILA-Lite-2B-Verifier"

# you can use config
config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_config(config, trust_remote_code=True)
# or directly from_pretrained
model = AutoModel.from_pretrained(model_path, trust_remote_code=True, device_map="auto")

yes_id = model.tokenizer.encode("yes", add_special_tokens=False)[0]
no_id = model.tokenizer.encode("no", add_special_tokens=False)[0]
files = [
        f"output/sana_test_prompt/0.png",
        f"output/sana_test_prompt/1.png"
    ],

prompt = "YOUR_GENERATED_PROMPT"

prompt = f"""You are an AI assistant specializing in image analysis and ranking. Your task is to analyze and compare image based on how well they match the given prompt. 
The given prompt is:{prompt}. Please consider the prompt and the image to make a decision and response directly with 'yes' or 'no'.
"""
            
r1, scores1 = model.generate_content([
    PIL.Image.open(files[0]),
    prompt
])

r2, scores2 = model.generate_content([
    PIL.Image.open(files[1]),
    prompt
])

if r1 == r2:
    if r1 == "yes":
        # pick the one with higher score for yes
        if scores1[0][0, yes_id] > scores2[0][0, yes_id]:
            selected_file = files[0]
        else:
            selected_file = files[1]
    else:
        # pick the one with less score for no
        if scores1[0][0, no_id] < scores2[0][0, no_id]:
            selected_file = files[0]
        else:
            selected_file = files[1]
else:
    if r1 == "yes":
        selected_file = files[0]
    else:
        selected_file = files[1]

```