Spaces:
Running
Running
A newer version of the Gradio SDK is available:
5.34.2
metadata
license: mit
title: SmolVLM2 Real-Time Captioning Demo
sdk: gradio
colorFrom: green
colorTo: blue
short_description: Real-time webcam captioning with SmolVLM2 on llama.cpp
sdk_version: 5.34.1
SmolVLM2 Real-Time Captioning Demo
This Hugging Face Spaces app uses Gradio v5 Blocks to capture your webcam feed every N milliseconds and run it through the SmolVLM2 model on your CPU, displaying live captions below each frame.
Features
- CPU-only inference via
llama-cpp-python
wrappingllama.cpp
. - Gradio live streaming for low-latency, browser-native video input.
- Adjustable interval slider (100 ms to 10 s) for frame capture frequency.
- Automatic GGUF model download from Hugging Face Hub when missing.
- Debug logging in the terminal for tracing each inference step.
Setup
Clone this repository
git clone <your-space-repo-url> cd <your-space-repo-name>
Install dependencies
pip install -r requirements.txt
(Optional) Pre-download model files These will be automatically downloaded if absent:
SmolVLM2-500M-Video-Instruct.Q8_0.gguf
mmproj-SmolVLM2-500M-Video-Instruct-Q8_0.gguf
To skip downloads, place both GGUF files in the repo root.
Usage
Launch the app:
python app.py
Open your browser at the URL shown in the terminal (e.g.
http://127.0.0.1:7860
).Allow webcam access when prompted.
Adjust the capture interval using the slider in the UI.
Live captions will appear below each video frame.
File Structure
app.py
— Main Gradio v5 Blocks application.requirements.txt
— Python dependencies..gguf
model files (auto-downloaded or user-provided).