# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

Stashface is a Python-based face recognition application that identifies performers in images using ensemble machine learning models.
It provides a Gradio web interface for uploading images and searching for performer matches against a database of known performers.

## Common Commands

### Installation and Setup
```bash
uv install  # Install dependencies using uv package manager
```

### Running the Application
```bash
python app.py  # Launch the Gradio web interface on localhost:7860
```

### Testing
```bash
pytest tests/  # Run all tests
pytest tests/test_vtt_parser.py  # Run specific test file
```

### Environment Variables
- `DEEPFACE_HOME`: Set to "." (current directory) for DeepFace model storage
- `CUDA_VISIBLE_DEVICES`: Set to "-1" to force CPU usage
- `VISAGE_KEY`: Required for decrypting performer database in persons.zip

## Architecture

### Core Components

1. **DataManager** (`models/data_manager.py`): Handles loading and querying face recognition data
   - Manages encrypted performer database (`data/persons.zip`)
   - Loads face embeddings from JSON (`data/faces.json`)
   - Handles Voyager vector indices for FaceNet and ArcFace models

2. **EnsembleFaceRecognition** (`models/face_recognition.py`): Implements ensemble face recognition
   - Combines FaceNet512 and ArcFace models using weighted voting
   - Normalizes distances and computes confidence scores
   - Uses DeepFace backend for face detection and embedding extraction

3. **WebInterface** (`web/interface.py`): Gradio-based web interface
   - Two main tabs: Multiple Face Search and Faces in Sprite
   - Handles image uploads and displays JSON results
   - Integrates with image processing pipeline

4. **Image Processing** (`models/image_processor.py`): Core image processing logic
   - Extracts faces using YOLOv8 and MediaPipe detectors
   - Generates embeddings for original and horizontally flipped images
   - Returns performer information with confidence scores

### Data Flow

1. User uploads image through Gradio interface
2. Face detection extracts individual faces from image
3. Face embeddings generated using ensemble models (FaceNet + ArcFace)
4. Embeddings queried against Voyager vector indices
5. Results ranked by confidence and returned with performer metadata

### Key Dependencies

- **DeepFace**: Face recognition and embedding extraction
- **Gradio**: Web interface framework
- **Voyager**: Vector similarity search indices
- **MediaPipe**: Alternative face detection backend
- **PyZipper**: Encrypted ZIP file handling for performer database
- **UV**: Modern Python package manager

## File Structure

```
stashface/
├── app.py                  # Main application entry point
├── data/                   # Face recognition data files
│   ├── faces.json         # Face metadata
│   ├── persons.zip        # Encrypted performer database
│   └── *.voy              # Voyager vector indices
├── models/                # Core ML models and data handling
│   ├── data_manager.py    # Data loading and querying
│   ├── face_recognition.py # Ensemble face recognition
│   └── image_processor.py # Image processing pipeline
├── web/                   # Web interface
│   └── interface.py       # Gradio interface
├── utils/                 # Utility functions
│   └── vtt_parser.py      # VTT file parsing for video sprites
└── tests/                 # Test files
```

## Development Notes

- The application uses CPU-only inference (CUDA disabled via environment variable)
- Face embeddings are averaged between original and horizontally flipped images for better accuracy
- The performer database is encrypted and requires the `VISAGE_KEY` environment variable
- Vector indices use E4M3 storage format for memory efficiency
- The ensemble approach combines FaceNet512 and ArcFace models with equal weighting (1.0 each)