|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
|
|
# What is Yi-VL? |
|
|
|
## Architecture |
|
|
|
Yi-VL adopts the [LLaVA](https://github.com/haotian-liu/LLaVA) architecture, which is composed of three primary components: |
|
|
|
- Vision Transformer (ViT): it's initialized with [CLIP ViT-H/14 model](https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K) and used for image encoding. |
|
|
|
- Projection Module: it's designed to align image features with text feature space, consisting of a two-layer Multilayer Perceptron (MLP) with layer normalizations. |
|
|
|
- Large Language Model (LLM): it's initialized with [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat) or [Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat), demonstrating exceptional proficiency in understanding and generating both English and Chinese. |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/656d9adce8bf55919aca7c3f/EGVHSWG4kAcX01xDaoeXS.png) |
|
|
|
|
|
# How to use Yi-VL? |
|
|
|
## Quick start |
|
|
|
This has been implemented into the SGLang codebase, where you can simply call this model by creating a function like so: |
|
``` |
|
import sglang as sgl |
|
|
|
@sgl.function |
|
def image_qa(s, image_path, question): |
|
s += sgl.user(sgl.image(image_path) + question) |
|
s += sgl.assistant(sgl.gen("answer")) |
|
|
|
|
|
runtime = sgl.Runtime(model_path="BabyChou/Yi-VL-34B", |
|
tokenizer_path="BabyChou/Yi-VL-34B") |
|
sgl.set_default_backend(runtime) |
|
|
|
|
|
# Single |
|
state = image_qa.run( |
|
image_path="images/cat.jpeg", |
|
question="What is this?", |
|
max_new_tokens=64) |
|
print(state["answer"], "\n") |
|
``` |
|
|
|
## License |
|
|
|
Please refer to the [acknowledgments and attributions](#acknowledgments_and_attributions) as well as individual components, for the license of source code. |
|
|
|
The Yi series models are fully open for academic research and free for commercial use, permissions of which are automatically granted upon application. |
|
|
|
All usage must adhere to the [Yi Series Models Community License Agreement 2.1](https://huggingface.co/01-ai/Yi-VL-34B/blob/main/LICENSE). |
|
|
|
For free commercial use, you only need to send an email to get official commercial permission. |