Luigi's picture
Update README.md
427620d verified

A newer version of the Gradio SDK is available: 5.34.2

Upgrade
metadata
license: mit
title: SmolVLM2 Real-Time Captioning Demo
sdk: gradio
colorFrom: green
colorTo: blue
short_description: Real-time webcam captioning with SmolVLM2 on llama.cpp
sdk_version: 5.34.1

SmolVLM2 Real-Time Captioning Demo

This Hugging Face Spaces app uses Gradio v5 Blocks to capture your webcam feed every N milliseconds and run it through the SmolVLM2 model on your CPU, displaying live captions below each frame.

Features

  • CPU-only inference via llama-cpp-python wrapping llama.cpp.
  • Gradio live streaming for low-latency, browser-native video input.
  • Adjustable interval slider (100 ms to 10 s) for frame capture frequency.
  • Automatic GGUF model download from Hugging Face Hub when missing.
  • Debug logging in the terminal for tracing each inference step.

Setup

  1. Clone this repository

    git clone <your-space-repo-url>
    cd <your-space-repo-name>
    
  2. Install dependencies

    pip install -r requirements.txt
    
  3. (Optional) Pre-download model files These will be automatically downloaded if absent:

    • SmolVLM2-500M-Video-Instruct.Q8_0.gguf
    • mmproj-SmolVLM2-500M-Video-Instruct-Q8_0.gguf

    To skip downloads, place both GGUF files in the repo root.

Usage

  1. Launch the app:

    python app.py
    
  2. Open your browser at the URL shown in the terminal (e.g. http://127.0.0.1:7860).

  3. Allow webcam access when prompted.

  4. Adjust the capture interval using the slider in the UI.

  5. Live captions will appear below each video frame.

File Structure

  • app.py — Main Gradio v5 Blocks application.
  • requirements.txt — Python dependencies.
  • .gguf model files (auto-downloaded or user-provided).

License