SmolVLM2-500M-Video-Instruct-Int8

This version of SmolVLM2-500M-Video-Instruct has been converted to run on the Axera NPU using w8a16 quantization.

Compatible with Pulsar2 version: 4.0

Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through the original repo:

Support Platform

AX650
- M4N-Dock(爱芯派Pro)

How to use

Download all files from this repository to the device.

Using AX650 Board

ai@ai-bj ~/yongqiang/SmolVLM2-500M-Video-Instruct $ tree -L 1
.
├── assets
├── embeds
├── infer_axmodel.py
├── README.md
├── smolvlm2_axmodel
├── smolvlm2_tokenizer
└── vit_mdoel

5 directories, 2 files

Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650N DEMO Board

Multimodal Understanding

input image

input text:

Can you describe this image?

log information:

ai@ai-bj ~/yongqiang/SmolVLM2-500M-Video-Instruct $ python3 infer_axmodel.py

input prompt: Can you describe this image?

answer >>  The image depicts a close-up view of a pink flower with a bee on it. The bee, which appears to be a bumblebee, is perched on the flower's center, which is surrounded by a cluster of other flowers. The bee is in the process of collecting nectar from the flower, which is a common behavior for bees. The flower itself has a yellow center with a cluster of yellow stamens surrounding it. The petals of the flower are a vibrant shade of pink, and the bee is positioned very close to^@ the camera, making it the focal point of the image. The background of the image is slightly blurred, but it appears to be a garden or a field with other flowers and plants, contributing to the overall natural setting of the image.

AXERA-TECH
/

SmolVLM2-500M-Video-Instruct

SmolVLM2-500M-Video-Instruct-Int8

Convert tools links:

Support Platform

How to use

Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650N DEMO Board

Model tree for AXERA-TECH/SmolVLM2-500M-Video-Instruct

Collections including AXERA-TECH/SmolVLM2-500M-Video-Instruct

Multimodal Models

HuggingFaceTB