File size: 14,416 Bytes
02ecbeb
 
 
 
 
 
 
 
 
 
 
 
d45feae
 
 
 
 
 
 
 
 
 
 
 
 
aa8e51e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d45feae
 
 
 
 
 
 
 
 
 
 
 
aa8e51e
 
 
 
 
 
d45feae
 
 
 
 
aa8e51e
d45feae
 
 
 
 
 
 
 
aa8e51e
 
 
 
d45feae
aa8e51e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
02ecbeb
 
 
 
 
 
d45feae
02ecbeb
 
d45feae
02ecbeb
 
 
d45feae
02ecbeb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a4293a7
02ecbeb
 
 
 
 
 
 
 
 
 
 
a4293a7
02ecbeb
 
 
 
 
a4293a7
d45feae
02ecbeb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
---
license: apache-2.0
tags:
- multi-modal
- language model
- LLaMA
base_model:
- meta-llama/Llama-3.2-11B-Vision
library_name: transformers
pipeline_tag: image-text-to-text
datasets:
- PKU-Alignment/align-anything
widget:
  - example_title: Lake Travel
    messages:
      - role: user
        content:
          - type: text
            text: What are the things I should be cautious about when I visit here?
          - type: image_url
            image_url:
              url: >-
                https://huggingface.co/PKU-Alignment/Pony-11B-Vision/resolve/main/example/001.jpg
      - role: assistant
        content: >
          When visiting a place like this, which appears to be a serene lake
          with a dock and surrounded by forested mountains, you should be
          cautious of several factors:

          1. **Weather Conditions**: Be aware of changing weather conditions
          such as sudden rainstorms or extreme temperatures, which can make
          activities dangerous.

          2. **Water Activities**: If engaging in water activities like swimming
          or fishing, ensure you're safe around the water, especially if there
          are strong currents or deep areas.

          3. **Wildlife**: The surrounding forest might have wildlife; stay
          informed about potential encounters and know how to react safely.

          4. **Injuries on the Dock**: Walking on docks can be slippery, so
          watch your step to avoid falls or injuries.

          5. **Environmental Protection**: Respect nature by not littering and
          staying on designated paths to protect local flora and fauna.

          6. **Physical Activity**: Engage in physical activity responsibly,
          ensuring you’re prepared for outdoor exertion and have necessary
          supplies like food, water, and first aid.

          7. **Isolation**: If venturing into remote areas, be mindful of
          isolation risks—let someone know your plans and expected return time.

          8. **Equipment Check**: When using equipment such as kayaks or camping
          gear, ensure everything is in good condition before use.

          9. **Local Regulations**: Follow any posted signs or regulations meant
          to preserve the environment and enhance safety for visitors.

          10. **Emergency Preparedness**: Know basic emergency procedures and
          carry essential items like a map, compass, flashlight, and first-aid
          kit.By being cautious of these factors, you can enjoy your visit more
          safely.
  - example_title: Unusual Man
    messages:
      - role: user
        content:
          - type: text
            text: What is unusual about this image?
          - type: image_url
            image_url:
              url: >-
                https://huggingface.co/PKU-Alignment/Pony-11B-Vision/resolve/main/example/002.jpg
      - role: assistant
        content: >
          The unusual aspect of this image is that a person is ironing clothes
          on the back of a moving taxi. This is an unconventional and unsafe
          activity, as it involves using an electric iron in close proximity to
          other vehicles and potentially causing harm or damage. Additionally,
          the presence of the ironing board and iron on the side of a speeding
          vehicle defies common sense and safety protocols.
  - example_title: Mario Themed Mugs
    messages:
      - role: user
        content:
          - type: text
            text: Write an attractive product description for this.
          - type: image_url
            image_url:
              url: >-
                https://huggingface.co/PKU-Alignment/Pony-11B-Vision/resolve/main/example/003.jpg
      - role: assistant
        content: >
          **Classic Mario Themed Mugs**

          Step into the world of nostalgia with these vibrant Classic Mario
          themed mugs! Featuring the beloved character from the iconic Super
          Mario video game series, these mugs are a perfect blend of retro charm
          and modern appeal. 
           
          - **Vibrant Graphics**: Each mug is adorned with high-quality prints
          of the classic Mario character, capturing his playful spirit and
          adventurous personality.  

          - **High-Quality Material**: Made from durable ceramic, these mugs
          ensure longevity and resistance to wear and tear, making them ideal
          for both daily use and decorative purposes.

          - **Perfect Gift**: Whether you're a long-time fan of the Mario
          franchise or looking for a fun gift idea, these mugs are sure to
          delight gamers of all ages.

          - **Versatile Use**: Ideal for coffee, tea, hot chocolate, or even as
          a quirky storage container for pens and pencils, these mugs add a
          touch of gaming joy to any setting.Bring home a piece of gaming
          history with these Classic Mario themed mugs and let your morning
          routine be as exciting as rescuing a princess from a towering castle!
---

# 🦫 Beaver-Vision-11B

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6527ac06b723b49e8c911e89/in9t4MCwlwJVtm9RxZvoV.png)

Beaver-Vision-11B is an <u>Image-Text-to-Text</u> chat assistant trained based on the [LLaMA-3.2-11B-Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision) (**pretrained version**)  using the [Align-Anything-Instruct](https://huggingface.co/datasets/PKU-Alignment/Align-Anything) dataset and [Align-Anything](https://github.com/PKU-Alignment/align-anything) framework.

Beaver-Vision-11B aims to enhance the instruction-following abilities of MLLMs (Multi-modal Large Language Models).
Compared with [LLaMA-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct), Beaver-Vision-11B uses [Align-Anything-Instruct](https://huggingface.co/datasets/PKU-Alignment/Align-Anything) dataset and post-training alignment method, achieving better performance. More importantly, Beaver-Vision-7B has open-sourced all of its training data, code, and evaluation scripts, providing greater convenience for the community and researchers.

- **Developed by:** the [PKU-Alignment](https://github.com/PKU-Alignment) Team.
- **Model Type:** An auto-regressive multi-modal (Image-Text-to-Text) language model based on the transformer architecture.
- **Fine-tuned from model:** [LLaMA-3.2-11B-Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision).

## Model Sources

- **Repository:** <https://github.com/PKU-Alignment/align-anything>
- **Dataset:** <https://huggingface.co/datasets/PKU-Alignment/Align-Anything>

## Evaluation Results

|                             | Metrics     | LLaMA-3.2-11B-Vision-Instruct | Beaver-Vision-11B |
|-----------------------------|-------------|-------------------------------|-----------------|
| **LLaVA-Bench-Coco**        | Rating      |             102.36            |      104.67     |
| **LLaVA-Bench-in-the-Wild** | Rating      |             92.76             |      99.13      |
| **POPE**                    | Accuracy    |             80.71             |      87.07      |
|                             | Recall      |             70.95             |      83.02      |
|                             | Precision   |             88.35             |      90.46      |
|                             | F1 Score    |             78.65             |      86.55      |

## Example

### Comparison

|Role|Content|
|---|---|
|User|What's the name of the restaurant serving these dishes?|
|| <img src="https://huggingface.co/PKU-Alignment/Pony-11B-Vision/resolve/main/example/005.jpg?raw=true" width="200" height="auto"> |
|LLaMA-3.2-11B-Vision-Instruct|The dish in the image is from a ramen restaurant called \"Ippudo.\" Ippudo is known for its high-quality, authentic Japanese ramen. The bowl with the distinctive design and rich broth is characteristic of their signature Ramen.|
|GPT-4o|The restaurant serving these dishes is ICHIRAN.|
|Beaver-Vision-11B|The image shows that the restaurant is ICHIRAN. Because it appears to be a ramen dish served in a traditional Japanese-style bowl with various accompaniments such as green onions and possibly some kind of meat or broth flavoring. Additionally, the presence of the ICHIRAN logo on the dishware helps identify the restaurant.|

In this example, LLaMA-3.2-11B-Vision-Instruct failed to recognize the logo text in the bowl and fell into hallucination; GPT-4o recognized accurately but did not provide more understandable information, while Beaver-Vision-11B was able to offer both accurate and detailed answers.

### Further Examples

| <img src="https://huggingface.co/PKU-Alignment/Beaver-Vision-11B/resolve/main/example/001.jpg?raw=true" width="200" height="auto"> | <img src="https://huggingface.co/PKU-Alignment/Beaver-Vision-11B/resolve/main/example/002.jpg?raw=true" width="200" height="auto"> | <img src="https://huggingface.co/PKU-Alignment/Beaver-Vision-11B/resolve/main/example/003.jpg?raw=true" width="200" height="auto"> |
|---|---|---| 
| <details><summary>prompt</summary><summary>What are the things I should be cautious about when I visit here?</details> | <details><summary>prompt</summary><summary>What is unusual about this image?</details> | <details><summary>prompt</summary><summary>Write an attractive product description for this.</details> |
| <details><summary>response</summary><summary>When visiting a place like this, which appears to be a serene lake with a dock and surrounded by forested mountains, you should be cautious of several factors:<br><br>1. **Weather Conditions**: Be aware of changing weather conditions such as sudden rainstorms or extreme temperatures, which can make activities dangerous.<br><br>2. **Water Activities**: If engaging in water activities like swimming or fishing, ensure you're safe around the water, especially if there are strong currents or deep areas.<br><br>3. **Wildlife**: The surrounding forest might have wildlife; stay informed about potential encounters and know how to react safely.<br><br>4. **Injuries on the Dock**: Walking on docks can be slippery, so watch your step to avoid falls or injuries.<br><br>5. **Environmental Protection**: Respect nature by not littering and staying on designated paths to protect local flora and fauna.<br><br>6. **Physical Activity**: Engage in physical activity responsibly, ensuring you’re prepared for outdoor exertion and have necessary supplies like food, water, and first aid.<br><br>7. **Isolation**: If venturing into remote areas, be mindful of isolation risks—let someone know your plans and expected return time.<br><br>8. **Equipment Check**: When using equipment such as kayaks or camping gear, ensure everything is in good condition before use.<br><br>9. **Local Regulations**: Follow any posted signs or regulations meant to preserve the environment and enhance safety for visitors.<br><br>10. **Emergency Preparedness**: Know basic emergency procedures and carry essential items like a map, compass, flashlight, and first-aid kit.<br><br>By being cautious of these factors, you can enjoy your visit more safely.</details> | <details><summary>response</summary><summary>The unusual aspect of this image is that a person is ironing clothes on the back of a moving taxi. This is an unconventional and unsafe activity, as it involves using an electric iron in close proximity to other vehicles and potentially causing harm or damage. Additionally, the presence of the ironing board and iron on the side of a speeding vehicle defies common sense and safety protocols.</details> | <details><summary>response</summary><summary>**Classic Mario Themed Mugs**<br><br>Step into the world of nostalgia with these vibrant Classic Mario themed mugs! Featuring the beloved character from the iconic Super Mario video game series, these mugs are a perfect blend of retro charm and modern appeal. <br><br>- **Vibrant Graphics**: Each mug is adorned with high-quality prints of the classic Mario character, capturing his playful spirit and adventurous personality.<br>  <br>- **High-Quality Material**: Made from durable ceramic, these mugs ensure longevity and resistance to wear and tear, making them ideal for both daily use and decorative purposes.<br><br>- **Perfect Gift**: Whether you're a long-time fan of the Mario franchise or looking for a fun gift idea, these mugs are sure to delight gamers of all ages.<br><br>- **Versatile Use**: Ideal for coffee, tea, hot chocolate, or even as a quirky storage container for pens and pencils, these mugs add a touch of gaming joy to any setting.<br><br>Bring home a piece of gaming history with these Classic Mario themed mugs and let your morning routine be as exciting as rescuing a princess from a towering castle!</details> |

 
## Usage

```python
import requests
import torch
from PIL import Image
from transformers import MllamaForConditionalGeneration, AutoProcessor

model_id = "PKU-Alignment/Beaver-Vision-11B"

model = MllamaForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_id)

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
image = Image.open(requests.get(url, stream=True).raw)

messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "If I had to write a haiku for this one, it would be: "}
    ]}
]
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(
    image,
    input_text,
    add_special_tokens=False,
    return_tensors="pt"
).to(model.device)

output = model.generate(**inputs, max_new_tokens=30)
print(processor.decode(output[0]))
```

## Citation

Please cite our work if you use the data or model in your paper.

```bash
@misc{align_anything,
  author = {PKU-Alignment Team},
  title = {Align Anything: training all modality models to follow instructions with unified language feedback},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/PKU-Alignment/align-anything}},
}
```

## License

Beaver-Vision-11B is released under Apache License 2.0, and you also need to agree with LLAMA 3.2 COMMUNITY LICENSE.