# Multi-GPU Setup Guide This guide explains how to run the neural OS demo with multiple GPUs and user queue management. ## Architecture Overview The system has been split into two main components: 1. **Dispatcher** (`dispatcher.py`): Handles WebSocket connections, manages user queues, and routes requests to workers 2. **Worker** (`worker.py`): Runs the actual model inference on individual GPUs ## Files Overview - `main.py` - Original single-GPU implementation (kept as backup) - `dispatcher.py` - Queue management and WebSocket handling - `worker.py` - GPU worker for model inference - `start_workers.py` - Helper script to start multiple workers - `start_system.sh` - Shell script to start the entire system - `tail_workers.py` - Script to monitor all worker logs simultaneously - `requirements.txt` - Dependencies - `static/index.html` - Frontend interface ## Setup Instructions ### 1. Install Dependencies ```bash pip install -r requirements.txt ``` ### 2. Start the Dispatcher The dispatcher runs on port 7860 and manages user connections and queues: ```bash python dispatcher.py ``` ### 3. Start Workers (One per GPU) Start one worker for each GPU you want to use. Workers automatically register with the dispatcher. #### GPU 0: ```bash python worker.py --gpu-id 0 ``` #### GPU 1: ```bash python worker.py --gpu-id 1 ``` #### GPU 2: ```bash python worker.py --gpu-id 2 ``` And so on for additional GPUs. Workers run on ports 8001, 8002, 8003, etc. (8001 + GPU_ID). ### 4. Access the Application Open your browser and go to: `http://localhost:7860` ## System Behavior ### Queue Management - **No Queue**: Users get normal timeout behavior (20 seconds of inactivity) - **With Queue**: Users get limited session time (60 seconds) with warnings and grace periods - **Grace Period**: If queue becomes empty during grace period, time limits are removed ### User Experience 1. **Immediate Access**: If GPUs are available, users start immediately 2. **Queue Position**: Users see their position and estimated wait time 3. **Session Warnings**: Users get warnings when their time is running out 4. **Grace Period**: 10-second countdown when session time expires, but if queue empties, users can continue 5. **Queue Updates**: Real-time updates on queue position every 5 seconds ### Worker Management - Workers automatically register with the dispatcher on startup - Workers send periodic pings (every 10 seconds) to maintain connection - Workers handle session cleanup when users disconnect - Each worker can handle one session at a time ### Input Queue Optimization The system implements intelligent input filtering to maintain performance: - **Queue Management**: Each worker maintains an input queue per session - **Interesting Input Detection**: The system identifies "interesting" inputs (clicks, key presses) vs. uninteresting ones (mouse movements) - **Smart Processing**: When multiple inputs are queued: - Processes "interesting" inputs immediately, skipping boring mouse movements - If no interesting inputs are found, processes the latest mouse position - This prevents the system from getting bogged down processing every mouse movement - **Performance**: Maintains responsiveness even during rapid mouse movements ## Configuration ### Dispatcher Settings (in `dispatcher.py`) ```python self.IDLE_TIMEOUT = 20.0 # When no queue self.QUEUE_WARNING_TIME = 10.0 self.MAX_SESSION_TIME_WITH_QUEUE = 60.0 # When there's a queue self.QUEUE_SESSION_WARNING_TIME = 45.0 # 15 seconds before timeout self.GRACE_PERIOD = 10.0 ``` ### Worker Settings (in `worker.py`) ```python self.MODEL_NAME = "yuntian-deng/computer-model-s-newnewd-freezernn-origunet-nospatial-online-x0-joint-onlineonly-222222k7-06k" self.SCREEN_WIDTH = 512 self.SCREEN_HEIGHT = 384 self.NUM_SAMPLING_STEPS = 32 self.USE_RNN = False ``` ## Monitoring ### Health Checks Check worker health: ```bash curl http://localhost:8001/health # GPU 0 curl http://localhost:8002/health # GPU 1 ``` ### Logs The system provides detailed logging for debugging and monitoring: **Dispatcher logs:** - `dispatcher.log` - All dispatcher activity, session management, queue operations **Worker logs:** - `workers.log` - Summary output from the worker startup script - `worker_gpu_0.log` - Detailed logs from GPU 0 worker - `worker_gpu_1.log` - Detailed logs from GPU 1 worker - `worker_gpu_N.log` - Detailed logs from GPU N worker **Monitor all worker logs:** ```bash # Tail all worker logs simultaneously python tail_workers.py --num-gpus 2 # Or monitor individual workers tail -f worker_gpu_0.log tail -f worker_gpu_1.log ``` ## Troubleshooting ### Common Issues 1. **Worker not registering**: Check that dispatcher is running first 2. **GPU memory issues**: Ensure each worker is assigned to a different GPU 3. **Port conflicts**: Make sure ports 7860, 8001, 8002, etc. are available 4. **Model loading errors**: Check that model files and configurations are present ### Debug Mode Enable debug logging by setting log level in both files: ```python logging.basicConfig(level=logging.DEBUG) ``` ## Scaling To add more GPUs: 1. Start additional workers with higher GPU IDs 2. Workers automatically register with the dispatcher 3. Queue processing automatically utilizes all available workers The system scales horizontally - add as many workers as you have GPUs available. ## API Endpoints ### Dispatcher - `GET /` - Serve the web interface - `WebSocket /ws` - User connections - `POST /register_worker` - Worker registration - `POST /worker_ping` - Worker health pings ### Worker - `POST /process_input` - Process user input - `POST /end_session` - Clean up session - `GET /health` - Health check