neural-os / MULTI_GPU_SETUP.md
da03
.
096295a

Multi-GPU Setup Guide

This guide explains how to run the neural OS demo with multiple GPUs and user queue management.

Architecture Overview

The system has been split into two main components:

  1. Dispatcher (dispatcher.py): Handles WebSocket connections, manages user queues, and routes requests to workers
  2. Worker (worker.py): Runs the actual model inference on individual GPUs

Files Overview

  • main.py - Original single-GPU implementation (kept as backup)
  • dispatcher.py - Queue management and WebSocket handling
  • worker.py - GPU worker for model inference
  • start_workers.py - Helper script to start multiple workers
  • start_system.sh - Shell script to start the entire system
  • tail_workers.py - Script to monitor all worker logs simultaneously
  • requirements.txt - Dependencies
  • static/index.html - Frontend interface

Setup Instructions

1. Install Dependencies

pip install -r requirements.txt

2. Start the Dispatcher

The dispatcher runs on port 7860 and manages user connections and queues:

python dispatcher.py

3. Start Workers (One per GPU)

Start one worker for each GPU you want to use. Workers automatically register with the dispatcher.

GPU 0:

python worker.py --gpu-id 0

GPU 1:

python worker.py --gpu-id 1

GPU 2:

python worker.py --gpu-id 2

And so on for additional GPUs.

Workers run on ports 8001, 8002, 8003, etc. (8001 + GPU_ID).

4. Access the Application

Open your browser and go to: http://localhost:7860

System Behavior

Queue Management

  • No Queue: Users get normal timeout behavior (20 seconds of inactivity)
  • With Queue: Users get limited session time (60 seconds) with warnings and grace periods
  • Grace Period: If queue becomes empty during grace period, time limits are removed

User Experience

  1. Immediate Access: If GPUs are available, users start immediately
  2. Queue Position: Users see their position and estimated wait time
  3. Session Warnings: Users get warnings when their time is running out
  4. Grace Period: 10-second countdown when session time expires, but if queue empties, users can continue
  5. Queue Updates: Real-time updates on queue position every 5 seconds

Worker Management

  • Workers automatically register with the dispatcher on startup
  • Workers send periodic pings (every 10 seconds) to maintain connection
  • Workers handle session cleanup when users disconnect
  • Each worker can handle one session at a time

Input Queue Optimization

The system implements intelligent input filtering to maintain performance:

  • Queue Management: Each worker maintains an input queue per session
  • Interesting Input Detection: The system identifies "interesting" inputs (clicks, key presses) vs. uninteresting ones (mouse movements)
  • Smart Processing: When multiple inputs are queued:
    • Processes "interesting" inputs immediately, skipping boring mouse movements
    • If no interesting inputs are found, processes the latest mouse position
    • This prevents the system from getting bogged down processing every mouse movement
  • Performance: Maintains responsiveness even during rapid mouse movements

Configuration

Dispatcher Settings (in dispatcher.py)

self.IDLE_TIMEOUT = 20.0  # When no queue
self.QUEUE_WARNING_TIME = 10.0
self.MAX_SESSION_TIME_WITH_QUEUE = 60.0  # When there's a queue
self.QUEUE_SESSION_WARNING_TIME = 45.0  # 15 seconds before timeout
self.GRACE_PERIOD = 10.0

Worker Settings (in worker.py)

self.MODEL_NAME = "yuntian-deng/computer-model-s-newnewd-freezernn-origunet-nospatial-online-x0-joint-onlineonly-222222k7-06k"
self.SCREEN_WIDTH = 512
self.SCREEN_HEIGHT = 384
self.NUM_SAMPLING_STEPS = 32
self.USE_RNN = False

Monitoring

Health Checks

Check worker health:

curl http://localhost:8001/health  # GPU 0
curl http://localhost:8002/health  # GPU 1

Logs

The system provides detailed logging for debugging and monitoring:

Dispatcher logs:

  • dispatcher.log - All dispatcher activity, session management, queue operations

Worker logs:

  • workers.log - Summary output from the worker startup script
  • worker_gpu_0.log - Detailed logs from GPU 0 worker
  • worker_gpu_1.log - Detailed logs from GPU 1 worker
  • worker_gpu_N.log - Detailed logs from GPU N worker

Monitor all worker logs:

# Tail all worker logs simultaneously
python tail_workers.py --num-gpus 2

# Or monitor individual workers
tail -f worker_gpu_0.log
tail -f worker_gpu_1.log

Troubleshooting

Common Issues

  1. Worker not registering: Check that dispatcher is running first
  2. GPU memory issues: Ensure each worker is assigned to a different GPU
  3. Port conflicts: Make sure ports 7860, 8001, 8002, etc. are available
  4. Model loading errors: Check that model files and configurations are present

Debug Mode

Enable debug logging by setting log level in both files:

logging.basicConfig(level=logging.DEBUG)

Scaling

To add more GPUs:

  1. Start additional workers with higher GPU IDs
  2. Workers automatically register with the dispatcher
  3. Queue processing automatically utilizes all available workers

The system scales horizontally - add as many workers as you have GPUs available.

API Endpoints

Dispatcher

  • GET / - Serve the web interface
  • WebSocket /ws - User connections
  • POST /register_worker - Worker registration
  • POST /worker_ping - Worker health pings

Worker

  • POST /process_input - Process user input
  • POST /end_session - Clean up session
  • GET /health - Health check