litert-community/SmolVLM-256M-Instruct

This model provides HuggingFaceTB/SmolVLM-256M-Instruct model in TFLite format.
You can use this model with Custom Cpp Pipiline or run with python pipeline (see COLAB example below).
Please note that, at the moment, AI Edge Torch VLMS not supported on MediaPipe LLM Inference API, for example qwen_vl model,
that was used as reference to write SmolVLM-256M-Instruct convertation scripts.

Use the models

Colab

Cpp inference

mkdir cache

bazel run --verbose_failures -c opt //ai_edge_torch/generative/examples/cpp_image:text_generator_main -- \
--tflite_model="/home/dragynir/ai_vlm/ai-edge-torch-smalvlm/ai_edge_torch/generative/examples/smalvlm/models/SmolVLM-256M-Instruct-tflite-single/smalvlm-256m-instruct_q8_ekv2048.tflite" \
--sentencepiece_model="/home/dragynir/ai_vlm/ai-edge-torch-smalvlm/ai_edge_torch/generative/examples/smalvlm/models/SmolVLM-256M-Instruct-tflite/tokenizer.model" \
--start_token="<|im_start|>" --stop_token="<end_of_utterance>" --num_threads=16 \
--prompt="User:<image>What in the image?<end_of_utterance>\nAssistant:" --weight_cache_path="/home/dragynir/llm/ai-edge-torch/ai_edge_torch/generative/examples/cpp/cache/model.xnnpack_cache" \
--use_single_image=true --image_path="/home/dragynir/ai_vlm/car.jpg" --max_generated_tokens=64

TFlite convertation

To fine-tune SmolVLM on a specific task, you can follow the fine-tuning tutorial.
Than, you can convert model to TFlite using custom smalvlm scripts (see Readme.md).
You can also check the official documentation ai-edge-torch generative.

Details

The model was converted with the following parameters:

python convert_to_tflite.py --quantize="dynamic_int8"\
 --checkpoint_path='./models/SmolVLM-256M-Instruct' --output_path="./models/SmolVLM-256M-Instruct-tflite"\
 --mask_as_input=True --prefill_seq_lens=256 --kv_cache_max_len=2048

litert-community
/

SmolVLM-256M-Instruct

Use the models

Colab

Cpp inference

TFlite convertation

Details

Model tree for litert-community/SmolVLM-256M-Instruct