# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview Stashface is a Python-based face recognition application that identifies performers in images using ensemble machine learning models. It provides a Gradio web interface for uploading images and searching for performer matches against a database of known performers. ## Common Commands ### Installation and Setup ```bash uv install # Install dependencies using uv package manager ``` ### Running the Application ```bash python app.py # Launch the Gradio web interface on localhost:7860 ``` ### Testing ```bash pytest tests/ # Run all tests pytest tests/test_vtt_parser.py # Run specific test file ``` ### Environment Variables - `DEEPFACE_HOME`: Set to "." (current directory) for DeepFace model storage - `CUDA_VISIBLE_DEVICES`: Set to "-1" to force CPU usage - `VISAGE_KEY`: Required for decrypting performer database in persons.zip ## Architecture ### Core Components 1. **DataManager** (`models/data_manager.py`): Handles loading and querying face recognition data - Manages encrypted performer database (`data/persons.zip`) - Loads face embeddings from JSON (`data/faces.json`) - Handles Voyager vector indices for FaceNet and ArcFace models 2. **EnsembleFaceRecognition** (`models/face_recognition.py`): Implements ensemble face recognition - Combines FaceNet512 and ArcFace models using weighted voting - Normalizes distances and computes confidence scores - Uses DeepFace backend for face detection and embedding extraction 3. **WebInterface** (`web/interface.py`): Gradio-based web interface - Two main tabs: Multiple Face Search and Faces in Sprite - Handles image uploads and displays JSON results - Integrates with image processing pipeline 4. **Image Processing** (`models/image_processor.py`): Core image processing logic - Extracts faces using YOLOv8 and MediaPipe detectors - Generates embeddings for original and horizontally flipped images - Returns performer information with confidence scores ### Data Flow 1. User uploads image through Gradio interface 2. Face detection extracts individual faces from image 3. Face embeddings generated using ensemble models (FaceNet + ArcFace) 4. Embeddings queried against Voyager vector indices 5. Results ranked by confidence and returned with performer metadata ### Key Dependencies - **DeepFace**: Face recognition and embedding extraction - **Gradio**: Web interface framework - **Voyager**: Vector similarity search indices - **MediaPipe**: Alternative face detection backend - **PyZipper**: Encrypted ZIP file handling for performer database - **UV**: Modern Python package manager ## File Structure ``` stashface/ ├── app.py # Main application entry point ├── data/ # Face recognition data files │ ├── faces.json # Face metadata │ ├── persons.zip # Encrypted performer database │ └── *.voy # Voyager vector indices ├── models/ # Core ML models and data handling │ ├── data_manager.py # Data loading and querying │ ├── face_recognition.py # Ensemble face recognition │ └── image_processor.py # Image processing pipeline ├── web/ # Web interface │ └── interface.py # Gradio interface ├── utils/ # Utility functions │ └── vtt_parser.py # VTT file parsing for video sprites └── tests/ # Test files ``` ## Development Notes - The application uses CPU-only inference (CUDA disabled via environment variable) - Face embeddings are averaged between original and horizontally flipped images for better accuracy - The performer database is encrypted and requires the `VISAGE_KEY` environment variable - Vector indices use E4M3 storage format for memory efficiency - The ensemble approach combines FaceNet512 and ArcFace models with equal weighting (1.0 each)