yongqiang
commited on
Commit
Β·
09c0149
1
Parent(s):
5a3d81c
update README
Browse files
README.md
CHANGED
@@ -1,3 +1,76 @@
|
|
1 |
---
|
2 |
license: bsd-3-clause
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: bsd-3-clause
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
- zh
|
6 |
+
base_model:
|
7 |
+
- HuggingFaceTB/SmolVLM2-500M-Video-Instruct
|
8 |
+
pipeline_tag: visual-question-answering
|
9 |
+
tags:
|
10 |
+
- HuggingFaceTB
|
11 |
+
- SmolVLM2-500M-Video-Instruct
|
12 |
---
|
13 |
+
|
14 |
+
# SmolVLM2-500M-Video-Instruct-Int8
|
15 |
+
|
16 |
+
This version of SmolVLM2-500M-Video-Instruct has been converted to run on the Axera NPU using **w8a16** quantization.
|
17 |
+
|
18 |
+
Compatible with Pulsar2 version: 4.0
|
19 |
+
|
20 |
+
## Convert tools links:
|
21 |
+
|
22 |
+
For those who are interested in model conversion, you can try to export axmodel through the original repo:
|
23 |
+
- https://huggingface.co/HuggingFaceTB/SmolVLM2-500M-Video-Instruct
|
24 |
+
|
25 |
+
<!-- - [Github for SmolVLM2-500M-Video-Instruct.axera](https://github.com/AXERA-TECH/SmolVLM2-500M-Video-Instruct.axera) -->
|
26 |
+
- [Pulsar2 Link, How to Convert LLM from Huggingface to axmodel](https://pulsar2-docs.readthedocs.io/en/latest/appendix/build_llm.html)
|
27 |
+
|
28 |
+
## Support Platform
|
29 |
+
- AX650
|
30 |
+
- [M4N-Dock(η±θ―ζ΄ΎPro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
|
31 |
+
|
32 |
+
<!-- ## TODO Model infer time -->
|
33 |
+
|
34 |
+
## How to use
|
35 |
+
|
36 |
+
Download all files from this repository to the device.
|
37 |
+
|
38 |
+
**Using AX650 Board**
|
39 |
+
|
40 |
+
```bash
|
41 |
+
ai@ai-bj ~/yongqiang/SmolVLM2-500M-Video-Instruct $ tree -L 1
|
42 |
+
.
|
43 |
+
βββ assets
|
44 |
+
βββ embeds
|
45 |
+
βββ infer_axmodel.py
|
46 |
+
βββ README.md
|
47 |
+
βββ smolvlm2_axmodel
|
48 |
+
βββ smolvlm2_tokenizer
|
49 |
+
βββ vit_mdoel
|
50 |
+
|
51 |
+
5 directories, 2 files
|
52 |
+
```
|
53 |
+
|
54 |
+
#### Inference with AX650 Host, such as M4N-Dock(η±θ―ζ΄ΎPro) or AX650N DEMO Board
|
55 |
+
|
56 |
+
**Multimodal Understanding**
|
57 |
+
|
58 |
+
input image
|
59 |
+
|
60 |
+

|
61 |
+
|
62 |
+
input text:
|
63 |
+
|
64 |
+
```
|
65 |
+
Can you describe this image?
|
66 |
+
```
|
67 |
+
|
68 |
+
log information:
|
69 |
+
|
70 |
+
```bash
|
71 |
+
ai@ai-bj ~/yongqiang/SmolVLM2-500M-Video-Instruct $ python3 infer_axmodel.py
|
72 |
+
|
73 |
+
input prompt: Can you describe this image?
|
74 |
+
|
75 |
+
answer >> The image captures a close-up view of a pink flower, prominently featuring a bumblebee. The bumblebee, with its black and yellow stripes, is in the center of the frame, its body slightly tilted to the left. The flower, with its petals fully spread, is the main subject of the image. The background is blurred, drawing focus to the flower and the bumblebee. The blurred background suggests a garden or a field, providing a sense of depth to the image. The^@ colors in the image are vibrant, with the pink of the flower contrasting against the green of the leaves and the brown of the stems. The image does not provide enough detail to confidently identify the specific location or landmark referred to as "sa_16743".
|
76 |
+
```
|