File size: 2,141 Bytes
5e13dcf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30d21de
 
127f0a0
 
 
30d21de
 
 
5e13dcf
 
30d21de
 
 
5e13dcf
 
30d21de
 
 
 
 
 
 
5e13dcf
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
license: other
license_name: yi-license
license_link: LICENSE
---


# What is Yi-VL?

## Architecture

Yi-VL adopts the [LLaVA](https://github.com/haotian-liu/LLaVA) architecture, which is composed of three primary components:

- Vision Transformer (ViT): it's initialized with [CLIP ViT-H/14 model](https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K) and used for image encoding.

- Projection Module: it's designed to align image features with text feature space, consisting of a two-layer Multilayer Perceptron (MLP) with layer normalizations.
  
- Large Language Model (LLM): it's initialized with [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat) or [Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat), demonstrating exceptional proficiency in understanding and generating both English and Chinese. 

![image/png](https://cdn-uploads.huggingface.co/production/uploads/656d9adce8bf55919aca7c3f/EGVHSWG4kAcX01xDaoeXS.png)


# How to use Yi-VL?

## Quick start

This has been implemented into the SGLang codebase, where you can simply call this model by creating a function like so:
```
import sglang as sgl

@sgl.function
def image_qa(s, image_path, question):
    s += sgl.user(sgl.image(image_path) + question)
    s += sgl.assistant(sgl.gen("answer"))


runtime = sgl.Runtime(model_path="BabyChou/Yi-VL-6B",
                      tokenizer_path="BabyChou/Yi-VL-6B")
sgl.set_default_backend(runtime)


# Single
state = image_qa.run(
    image_path="images/cat.jpeg",
    question="What is this?",
    max_new_tokens=64)
print(state["answer"], "\n")
```

## License

Please refer to the [acknowledgments and attributions](#acknowledgments_and_attributions) as well as individual components, for the license of source code. 

The Yi series models are fully open for academic research and free for commercial use, permissions of which are automatically granted upon application. 

All usage must adhere to the [Yi Series Models Community License Agreement 2.1](https://huggingface.co/01-ai/Yi-VL-34B/blob/main/LICENSE). 

For free commercial use, you only need to send an email to get official commercial permission.