# Multi-GPU Setup Guide

This guide explains how to run the neural OS demo with multiple GPUs and user queue management.

## Architecture Overview

The system has been split into two main components:

1. **Dispatcher** (`dispatcher.py`): Handles WebSocket connections, manages user queues, and routes requests to workers
2. **Worker** (`worker.py`): Runs the actual model inference on individual GPUs

## Files Overview

- `main.py` - Original single-GPU implementation (kept as backup)
- `dispatcher.py` - Queue management and WebSocket handling
- `worker.py` - GPU worker for model inference
- `start_workers.py` - Helper script to start multiple workers
- `start_system.sh` - Shell script to start the entire system
- `tail_workers.py` - Script to monitor all worker logs simultaneously
- `requirements.txt` - Dependencies
- `static/index.html` - Frontend interface

## Setup Instructions

### 1. Install Dependencies

```bash
pip install -r requirements.txt
```

### 2. Start the Dispatcher

The dispatcher runs on port 7860 and manages user connections and queues:

```bash
python dispatcher.py
```

### 3. Start Workers (One per GPU)

Start one worker for each GPU you want to use. Workers automatically register with the dispatcher.

#### GPU 0:
```bash
python worker.py --gpu-id 0
```

#### GPU 1:
```bash
python worker.py --gpu-id 1
```

#### GPU 2:
```bash
python worker.py --gpu-id 2
```

And so on for additional GPUs.

Workers run on ports 8001, 8002, 8003, etc. (8001 + GPU_ID).

### 4. Access the Application

Open your browser and go to: `http://localhost:7860`

## System Behavior

### Queue Management

- **No Queue**: Users get normal timeout behavior (20 seconds of inactivity)
- **With Queue**: Users get limited session time (60 seconds) with warnings and grace periods
- **Grace Period**: If queue becomes empty during grace period, time limits are removed

### User Experience

1. **Immediate Access**: If GPUs are available, users start immediately
2. **Queue Position**: Users see their position and estimated wait time
3. **Session Warnings**: Users get warnings when their time is running out
4. **Grace Period**: 10-second countdown when session time expires, but if queue empties, users can continue
5. **Queue Updates**: Real-time updates on queue position every 5 seconds

### Worker Management

- Workers automatically register with the dispatcher on startup
- Workers send periodic pings (every 10 seconds) to maintain connection
- Workers handle session cleanup when users disconnect
- Each worker can handle one session at a time

### Input Queue Optimization

The system implements intelligent input filtering to maintain performance:

- **Queue Management**: Each worker maintains an input queue per session
- **Interesting Input Detection**: The system identifies "interesting" inputs (clicks, key presses) vs. uninteresting ones (mouse movements)
- **Smart Processing**: When multiple inputs are queued:
  - Processes "interesting" inputs immediately, skipping boring mouse movements
  - If no interesting inputs are found, processes the latest mouse position
  - This prevents the system from getting bogged down processing every mouse movement
- **Performance**: Maintains responsiveness even during rapid mouse movements

## Configuration

### Dispatcher Settings (in `dispatcher.py`)

```python
self.IDLE_TIMEOUT = 20.0  # When no queue
self.QUEUE_WARNING_TIME = 10.0
self.MAX_SESSION_TIME_WITH_QUEUE = 60.0  # When there's a queue
self.QUEUE_SESSION_WARNING_TIME = 45.0  # 15 seconds before timeout
self.GRACE_PERIOD = 10.0
```

### Worker Settings (in `worker.py`)

```python
self.MODEL_NAME = "yuntian-deng/computer-model-s-newnewd-freezernn-origunet-nospatial-online-x0-joint-onlineonly-222222k7-06k"
self.SCREEN_WIDTH = 512
self.SCREEN_HEIGHT = 384
self.NUM_SAMPLING_STEPS = 32
self.USE_RNN = False
```

## Monitoring

### Health Checks

Check worker health:
```bash
curl http://localhost:8001/health  # GPU 0
curl http://localhost:8002/health  # GPU 1
```

### Logs

The system provides detailed logging for debugging and monitoring:

**Dispatcher logs:**
- `dispatcher.log` - All dispatcher activity, session management, queue operations

**Worker logs:**
- `workers.log` - Summary output from the worker startup script
- `worker_gpu_0.log` - Detailed logs from GPU 0 worker
- `worker_gpu_1.log` - Detailed logs from GPU 1 worker
- `worker_gpu_N.log` - Detailed logs from GPU N worker

**Monitor all worker logs:**
```bash
# Tail all worker logs simultaneously
python tail_workers.py --num-gpus 2

# Or monitor individual workers
tail -f worker_gpu_0.log
tail -f worker_gpu_1.log
```

## Troubleshooting

### Common Issues

1. **Worker not registering**: Check that dispatcher is running first
2. **GPU memory issues**: Ensure each worker is assigned to a different GPU
3. **Port conflicts**: Make sure ports 7860, 8001, 8002, etc. are available
4. **Model loading errors**: Check that model files and configurations are present

### Debug Mode

Enable debug logging by setting log level in both files:
```python
logging.basicConfig(level=logging.DEBUG)
```

## Scaling

To add more GPUs:
1. Start additional workers with higher GPU IDs
2. Workers automatically register with the dispatcher
3. Queue processing automatically utilizes all available workers

The system scales horizontally - add as many workers as you have GPUs available.

## API Endpoints

### Dispatcher
- `GET /` - Serve the web interface
- `WebSocket /ws` - User connections
- `POST /register_worker` - Worker registration
- `POST /worker_ping` - Worker health pings

### Worker
- `POST /process_input` - Process user input
- `POST /end_session` - Clean up session
- `GET /health` - Health check