Multi-GPU Setup Guide

This guide explains how to run the neural OS demo with multiple GPUs and user queue management.

Architecture Overview

The system has been split into two main components:

Dispatcher (dispatcher.py): Handles WebSocket connections, manages user queues, and routes requests to workers
Worker (worker.py): Runs the actual model inference on individual GPUs

Files Overview

main.py - Original single-GPU implementation (kept as backup)
dispatcher.py - Queue management and WebSocket handling
worker.py - GPU worker for model inference
start_workers.py - Helper script to start multiple workers
start_system.sh - Shell script to start the entire system
tail_workers.py - Script to monitor all worker logs simultaneously
requirements.txt - Dependencies
static/index.html - Frontend interface

Setup Instructions

1. Install Dependencies

pip install -r requirements.txt

2. Start the Dispatcher

The dispatcher runs on port 7860 and manages user connections and queues:

python dispatcher.py

3. Start Workers (One per GPU)

Start one worker for each GPU you want to use. Workers automatically register with the dispatcher.

GPU 0:

python worker.py --gpu-id 0

GPU 1:

python worker.py --gpu-id 1

GPU 2:

python worker.py --gpu-id 2

And so on for additional GPUs.

Workers run on ports 8001, 8002, 8003, etc. (8001 + GPU_ID).

4. Access the Application

Open your browser and go to: http://localhost:7860

System Behavior

Queue Management

No Queue: Users get normal timeout behavior (20 seconds of inactivity)
With Queue: Users get limited session time (60 seconds) with warnings and grace periods
Grace Period: If queue becomes empty during grace period, time limits are removed

User Experience

Immediate Access: If GPUs are available, users start immediately
Queue Position: Users see their position and estimated wait time
Session Warnings: Users get warnings when their time is running out
Grace Period: 10-second countdown when session time expires, but if queue empties, users can continue
Queue Updates: Real-time updates on queue position every 5 seconds

Worker Management

Workers automatically register with the dispatcher on startup
Workers send periodic pings (every 10 seconds) to maintain connection
Workers handle session cleanup when users disconnect
Each worker can handle one session at a time

Input Queue Optimization

The system implements intelligent input filtering to maintain performance:

Queue Management: Each worker maintains an input queue per session
Interesting Input Detection: The system identifies "interesting" inputs (clicks, key presses) vs. uninteresting ones (mouse movements)
Smart Processing: When multiple inputs are queued:
- Processes "interesting" inputs immediately, skipping boring mouse movements
- If no interesting inputs are found, processes the latest mouse position
- This prevents the system from getting bogged down processing every mouse movement
Performance: Maintains responsiveness even during rapid mouse movements

Configuration

Dispatcher Settings (in `dispatcher.py`)

self.IDLE_TIMEOUT = 20.0  # When no queue
self.QUEUE_WARNING_TIME = 10.0
self.MAX_SESSION_TIME_WITH_QUEUE = 60.0  # When there's a queue
self.QUEUE_SESSION_WARNING_TIME = 45.0  # 15 seconds before timeout
self.GRACE_PERIOD = 10.0

Worker Settings (in `worker.py`)

self.MODEL_NAME = "yuntian-deng/computer-model-s-newnewd-freezernn-origunet-nospatial-online-x0-joint-onlineonly-222222k7-06k"
self.SCREEN_WIDTH = 512
self.SCREEN_HEIGHT = 384
self.NUM_SAMPLING_STEPS = 32
self.USE_RNN = False

Monitoring

Health Checks

Check worker health:

curl http://localhost:8001/health  # GPU 0
curl http://localhost:8002/health  # GPU 1

Logs

The system provides detailed logging for debugging and monitoring:

Dispatcher logs:

dispatcher.log - All dispatcher activity, session management, queue operations

Worker logs:

workers.log - Summary output from the worker startup script
worker_gpu_0.log - Detailed logs from GPU 0 worker
worker_gpu_1.log - Detailed logs from GPU 1 worker
worker_gpu_N.log - Detailed logs from GPU N worker

Monitor all worker logs:

# Tail all worker logs simultaneously
python tail_workers.py --num-gpus 2

# Or monitor individual workers
tail -f worker_gpu_0.log
tail -f worker_gpu_1.log

Troubleshooting

Common Issues

Worker not registering: Check that dispatcher is running first
GPU memory issues: Ensure each worker is assigned to a different GPU
Port conflicts: Make sure ports 7860, 8001, 8002, etc. are available
Model loading errors: Check that model files and configurations are present

Debug Mode

Enable debug logging by setting log level in both files:

logging.basicConfig(level=logging.DEBUG)

Scaling

To add more GPUs:

Start additional workers with higher GPU IDs
Workers automatically register with the dispatcher
Queue processing automatically utilizes all available workers

The system scales horizontally - add as many workers as you have GPUs available.

Spaces:

Duplicated from yuntian-deng/neural-os

yuntian-group
/

neural-os

Runtime error

Multi-GPU Setup Guide

Architecture Overview

Files Overview

Setup Instructions

1. Install Dependencies

2. Start the Dispatcher

3. Start Workers (One per GPU)

GPU 0:

GPU 1:

GPU 2:

4. Access the Application

System Behavior

Queue Management

User Experience

Worker Management

Input Queue Optimization

Configuration

Dispatcher Settings (in `dispatcher.py`)

Worker Settings (in `worker.py`)

Monitoring

Health Checks

Logs

Troubleshooting

Common Issues

Debug Mode

Scaling

API Endpoints

Dispatcher

Worker

Multi-GPU Setup Guide

Architecture Overview

Files Overview

Setup Instructions

1. Install Dependencies

2. Start the Dispatcher

3. Start Workers (One per GPU)

GPU 0:

GPU 1:

GPU 2:

4. Access the Application

System Behavior

Queue Management

User Experience

Worker Management

Input Queue Optimization

Configuration

Dispatcher Settings (in dispatcher.py)

Worker Settings (in worker.py)

Monitoring

Health Checks

Logs

Troubleshooting

Common Issues

Debug Mode

Scaling

API Endpoints

Dispatcher

Worker

Dispatcher Settings (in `dispatcher.py`)

Worker Settings (in `worker.py`)