Spaces:
Runtime error
Multi-GPU Setup Guide
This guide explains how to run the neural OS demo with multiple GPUs and user queue management.
Architecture Overview
The system has been split into two main components:
- Dispatcher (
dispatcher.py
): Handles WebSocket connections, manages user queues, and routes requests to workers - Worker (
worker.py
): Runs the actual model inference on individual GPUs
Files Overview
main.py
- Original single-GPU implementation (kept as backup)dispatcher.py
- Queue management and WebSocket handlingworker.py
- GPU worker for model inferencestart_workers.py
- Helper script to start multiple workersstart_system.sh
- Shell script to start the entire systemtail_workers.py
- Script to monitor all worker logs simultaneouslyrequirements.txt
- Dependenciesstatic/index.html
- Frontend interface
Setup Instructions
1. Install Dependencies
pip install -r requirements.txt
2. Start the Dispatcher
The dispatcher runs on port 7860 and manages user connections and queues:
python dispatcher.py
3. Start Workers (One per GPU)
Start one worker for each GPU you want to use. Workers automatically register with the dispatcher.
GPU 0:
python worker.py --gpu-id 0
GPU 1:
python worker.py --gpu-id 1
GPU 2:
python worker.py --gpu-id 2
And so on for additional GPUs.
Workers run on ports 8001, 8002, 8003, etc. (8001 + GPU_ID).
4. Access the Application
Open your browser and go to: http://localhost:7860
System Behavior
Queue Management
- No Queue: Users get normal timeout behavior (20 seconds of inactivity)
- With Queue: Users get limited session time (60 seconds) with warnings and grace periods
- Grace Period: If queue becomes empty during grace period, time limits are removed
User Experience
- Immediate Access: If GPUs are available, users start immediately
- Queue Position: Users see their position and estimated wait time
- Session Warnings: Users get warnings when their time is running out
- Grace Period: 10-second countdown when session time expires, but if queue empties, users can continue
- Queue Updates: Real-time updates on queue position every 5 seconds
Worker Management
- Workers automatically register with the dispatcher on startup
- Workers send periodic pings (every 10 seconds) to maintain connection
- Workers handle session cleanup when users disconnect
- Each worker can handle one session at a time
Input Queue Optimization
The system implements intelligent input filtering to maintain performance:
- Queue Management: Each worker maintains an input queue per session
- Interesting Input Detection: The system identifies "interesting" inputs (clicks, key presses) vs. uninteresting ones (mouse movements)
- Smart Processing: When multiple inputs are queued:
- Processes "interesting" inputs immediately, skipping boring mouse movements
- If no interesting inputs are found, processes the latest mouse position
- This prevents the system from getting bogged down processing every mouse movement
- Performance: Maintains responsiveness even during rapid mouse movements
Configuration
Dispatcher Settings (in dispatcher.py
)
self.IDLE_TIMEOUT = 20.0 # When no queue
self.QUEUE_WARNING_TIME = 10.0
self.MAX_SESSION_TIME_WITH_QUEUE = 60.0 # When there's a queue
self.QUEUE_SESSION_WARNING_TIME = 45.0 # 15 seconds before timeout
self.GRACE_PERIOD = 10.0
Worker Settings (in worker.py
)
self.MODEL_NAME = "yuntian-deng/computer-model-s-newnewd-freezernn-origunet-nospatial-online-x0-joint-onlineonly-222222k7-06k"
self.SCREEN_WIDTH = 512
self.SCREEN_HEIGHT = 384
self.NUM_SAMPLING_STEPS = 32
self.USE_RNN = False
Monitoring
Health Checks
Check worker health:
curl http://localhost:8001/health # GPU 0
curl http://localhost:8002/health # GPU 1
Logs
The system provides detailed logging for debugging and monitoring:
Dispatcher logs:
dispatcher.log
- All dispatcher activity, session management, queue operations
Worker logs:
workers.log
- Summary output from the worker startup scriptworker_gpu_0.log
- Detailed logs from GPU 0 workerworker_gpu_1.log
- Detailed logs from GPU 1 workerworker_gpu_N.log
- Detailed logs from GPU N worker
Monitor all worker logs:
# Tail all worker logs simultaneously
python tail_workers.py --num-gpus 2
# Or monitor individual workers
tail -f worker_gpu_0.log
tail -f worker_gpu_1.log
Troubleshooting
Common Issues
- Worker not registering: Check that dispatcher is running first
- GPU memory issues: Ensure each worker is assigned to a different GPU
- Port conflicts: Make sure ports 7860, 8001, 8002, etc. are available
- Model loading errors: Check that model files and configurations are present
Debug Mode
Enable debug logging by setting log level in both files:
logging.basicConfig(level=logging.DEBUG)
Scaling
To add more GPUs:
- Start additional workers with higher GPU IDs
- Workers automatically register with the dispatcher
- Queue processing automatically utilizes all available workers
The system scales horizontally - add as many workers as you have GPUs available.
API Endpoints
Dispatcher
GET /
- Serve the web interfaceWebSocket /ws
- User connectionsPOST /register_worker
- Worker registrationPOST /worker_ping
- Worker health pings
Worker
POST /process_input
- Process user inputPOST /end_session
- Clean up sessionGET /health
- Health check