--- license: mit title: SmolVLM2 Real-Time Captioning Demo sdk: gradio colorFrom: green colorTo: blue short_description: Real-time webcam captioning with SmolVLM2 on llama.cpp sdk_version: 5.34.1 --- # SmolVLM2 Real-Time Captioning Demo This Hugging Face Spaces app uses **Gradio v5 Blocks** to capture your webcam feed every *N* milliseconds and run it through the SmolVLM2 model on your CPU, displaying live captions below each frame. ## Features * **CPU-only inference** via `llama-cpp-python` wrapping `llama.cpp`. * **Gradio live streaming** for low-latency, browser-native video input. * **Adjustable interval slider** (100 ms to 10 s) for frame capture frequency. * **Automatic GGUF model download** from Hugging Face Hub when missing. * **Debug logging** in the terminal for tracing each inference step. ## Setup 1. **Clone this repository** ```bash git clone cd ``` 2. **Install dependencies** ```bash pip install -r requirements.txt ``` 3. **(Optional) Pre-download model files** These will be automatically downloaded if absent: * `SmolVLM2-500M-Video-Instruct.Q8_0.gguf` * `mmproj-SmolVLM2-500M-Video-Instruct-Q8_0.gguf` To skip downloads, place both GGUF files in the repo root. ## Usage 1. **Launch the app**: ```bash python app.py ``` 2. **Open your browser** at the URL shown in the terminal (e.g. `http://127.0.0.1:7860`). 3. **Allow webcam access** when prompted. 4. **Adjust the capture interval** using the slider in the UI. 5. **Live captions** will appear below each video frame. ## File Structure * `app.py` — Main Gradio v5 Blocks application. * `requirements.txt` — Python dependencies. * `.gguf` model files (auto-downloaded or user-provided). ## License