metadata

license: mit
title: SmolVLM2 Real-Time Captioning Demo
sdk: gradio
colorFrom: green
colorTo: blue
short_description: Real-time webcam captioning with SmolVLM2 on llama.cpp
sdk_version: 5.34.1

SmolVLM2 Real-Time Captioning Demo

This Hugging Face Spaces app uses Gradio v5 Blocks to capture your webcam feed every N milliseconds and run it through the SmolVLM2 model on your CPU, displaying live captions below each frame.

Features

CPU-only inference via llama-cpp-python wrapping llama.cpp.
Gradio live streaming for low-latency, browser-native video input.
Adjustable interval slider (100 ms to 10 s) for frame capture frequency.
Automatic GGUF model download from Hugging Face Hub when missing.
Debug logging in the terminal for tracing each inference step.

Setup

Clone this repository

git clone <your-space-repo-url>
cd <your-space-repo-name>

Install dependencies
```
pip install -r requirements.txt
```
(Optional) Pre-download model files These will be automatically downloaded if absent:
- SmolVLM2-500M-Video-Instruct.Q8_0.gguf
- mmproj-SmolVLM2-500M-Video-Instruct-Q8_0.gguf
To skip downloads, place both GGUF files in the repo root.

Usage

Launch the app:
```
python app.py
```
Open your browser at the URL shown in the terminal (e.g. http://127.0.0.1:7860).
Allow webcam access when prompted.
Adjust the capture interval using the slider in the UI.
Live captions will appear below each video frame.

File Structure

app.py — Main Gradio v5 Blocks application.
requirements.txt — Python dependencies.
.gguf model files (auto-downloaded or user-provided).

Spaces:

Luigi
/

SmolVLM2-on-llama.cpp

Running

SmolVLM2 Real-Time Captioning Demo

Features

Setup

Usage

File Structure

License