File size: 2,101 Bytes
92c8e0c
1183ed7
92c8e0c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c4c8050
 
 
92c8e0c
c4c8050
 
 
 
92c8e0c
 
c4c8050
 
 
92c8e0c
 
c4c8050
 
 
 
 
 
 
92c8e0c
 
 
 
 
 
 
 
 
c4c8050
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
license: apache-2.0
---


# What is Yi-VL?

## Architecture

Yi-VL adopts the [LLaVA](https://github.com/haotian-liu/LLaVA) architecture, which is composed of three primary components:

- Vision Transformer (ViT): it's initialized with [CLIP ViT-H/14 model](https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K) and used for image encoding.

- Projection Module: it's designed to align image features with text feature space, consisting of a two-layer Multilayer Perceptron (MLP) with layer normalizations.
  
- Large Language Model (LLM): it's initialized with [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat) or [Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat), demonstrating exceptional proficiency in understanding and generating both English and Chinese. 

![image/png](https://cdn-uploads.huggingface.co/production/uploads/656d9adce8bf55919aca7c3f/EGVHSWG4kAcX01xDaoeXS.png)


# How to use Yi-VL?

## Quick start

This has been implemented into the SGLang codebase, where you can simply call this model by creating a function like so:
```
import sglang as sgl

@sgl.function
def image_qa(s, image_path, question):
    s += sgl.user(sgl.image(image_path) + question)
    s += sgl.assistant(sgl.gen("answer"))


runtime = sgl.Runtime(model_path="BabyChou/Yi-VL-34B",
                      tokenizer_path="BabyChou/Yi-VL-34B")
sgl.set_default_backend(runtime)


# Single
state = image_qa.run(
    image_path="images/cat.jpeg",
    question="What is this?",
    max_new_tokens=64)
print(state["answer"], "\n")
```

## License

Please refer to the [acknowledgments and attributions](#acknowledgments_and_attributions) as well as individual components, for the license of source code. 

The Yi series models are fully open for academic research and free for commercial use, permissions of which are automatically granted upon application. 

All usage must adhere to the [Yi Series Models Community License Agreement 2.1](https://huggingface.co/01-ai/Yi-VL-34B/blob/main/LICENSE). 

For free commercial use, you only need to send an email to get official commercial permission.