Spaces:
Paused
Paused
Ollama Worker Setup Guide
Overview
The Ollama worker runs locally and uses your GPU for video processing, reducing cloud compute costs and quota issues.
Prerequisites
1. Install Ollama
# Windows (PowerShell)
winget install Ollama.Ollama
# Or download from https://ollama.ai/download
2. Start Ollama Service
ollama serve
3. Pull Required Models
# Pull the main LLM model
ollama pull llama3.2:latest
# Pull the whisper model for transcription
ollama pull whisper:latest
# Optional: Pull other models
ollama pull llama3.2:13b # For better quality (requires more VRAM)
ollama pull codellama:latest # For code analysis
Configuration
Environment Variables
Create a .env file or set these variables:
# Ollama Configuration
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.2:latest
OLLAMA_WHISPER_MODEL=whisper:latest
# Worker Configuration
OLLAMA_POLL_INTERVAL_SECONDS=120
OLLAMA_MAX_VIDEOS_PER_CYCLE=1
OLLAMA_BACKOFF_SECONDS=300
Model Selection
- llama3.2:latest - Good balance of speed and quality
- llama3.2:13b - Better quality, needs more VRAM
- llama3.2:70b - Best quality, needs 40GB+ VRAM
- whisper:latest - For transcription (works well on GPU)
Running the Worker
Option 1: Windows Batch File
# Double-click or run:
start-ollama-worker.bat
Option 2: Manual Start
# Activate virtual environment
aienv\Scripts\activate
# Set environment variables
set OLLAMA_POLL_INTERVAL_SECONDS=120
set OLLAMA_MAX_VIDEOS_PER_CYCLE=1
set OLLAMA_BASE_URL=http://localhost:11434
set OLLAMA_MODEL=llama3.2:latest
set OLLAMA_WHISPER_MODEL=whisper:latest
# Start worker
python worker\ollama_daemon.py
Option 3: Docker with Ollama
# Add to your Dockerfile
RUN curl -fsSL https://ollama.ai/install.sh | sh
RUN ollama pull llama3.2:latest
RUN ollama pull whisper:latest
Features
1. Local GPU Processing
- Uses your local GPU for all AI tasks
- No cloud API calls for transcription/analysis
- Reduces quota usage and costs
2. Video Processing Pipeline
- Transcription: Ollama Whisper model
- Summarization: Ollama LLM model
- Enhanced Analysis: Topics, sentiment, insights
- PDF Generation: Local report creation
- S3 Upload: Cloud storage for PDFs
3. Error Handling
- Health checks for Ollama service
- Automatic backoff on errors
- Graceful fallbacks
- Comprehensive logging
4. Configuration Options
- Adjustable poll intervals
- Max videos per cycle
- Model selection
- Timeout settings
Monitoring
Logs
- Worker logs:
ollama_worker.log - Console output for real-time monitoring
- Error tracking and backoff notifications
Health Checks
- Ollama service availability
- Model loading status
- Processing success rates
Troubleshooting
Common Issues
Ollama not running
# Check if Ollama is running curl http://localhost:11434/api/tags # Start Ollama ollama serveModel not found
# List available models ollama list # Pull missing model ollama pull llama3.2:latestGPU memory issues
- Use smaller models (llama3.2:latest instead of 13b/70b)
- Reduce batch sizes
- Check GPU memory usage
Slow processing
- Ensure GPU is being used (check Ollama logs)
- Use faster models
- Increase timeout values
Performance Tuning
GPU Optimization
# Set GPU memory fraction export CUDA_VISIBLE_DEVICES=0 # Use specific GPU ollama serve --gpu 0Model Optimization
- Use quantized models for faster inference
- Adjust temperature and top_p parameters
- Monitor VRAM usage
Worker Configuration
- Increase
OLLAMA_MAX_VIDEOS_PER_CYCLEfor batch processing - Decrease
OLLAMA_POLL_INTERVAL_SECONDSfor faster processing - Adjust timeout values based on your hardware
- Increase
Integration
With Existing System
- Runs alongside the cloud worker
- Processes the same pending videos
- Uses the same database and S3 storage
- Can be used as primary or backup worker
Switching Between Workers
- Cloud-only: Set
WORKER_MAX_VIDEOS_PER_CYCLE=0in cloud worker - Local-only: Stop cloud worker, run Ollama worker
- Hybrid: Run both with different priorities
Cost Benefits
- No cloud API costs for transcription/analysis
- Uses local GPU resources
- Reduces quota usage on cloud platforms
- Better privacy (data stays local)
Security
- All processing happens locally
- No data sent to external APIs (except S3 for storage)
- Full control over models and data
- Can run completely offline (except for S3 uploads)