Spaces:

peace2024
/

dubswayAgenticV2

Paused

App Files Files Community

dubswayAgenticV2 / setup-ollama.md

peace2024

new update

71d10ae 3 months ago

preview code

raw

history blame contribute delete

4.74 kB

Ollama Worker Setup Guide

Overview

The Ollama worker runs locally and uses your GPU for video processing, reducing cloud compute costs and quota issues.

Prerequisites

1. Install Ollama

# Windows (PowerShell)
winget install Ollama.Ollama

# Or download from https://ollama.ai/download

2. Start Ollama Service

ollama serve

3. Pull Required Models

# Pull the main LLM model
ollama pull llama3.2:latest

# Pull the whisper model for transcription
ollama pull whisper:latest

# Optional: Pull other models
ollama pull llama3.2:13b  # For better quality (requires more VRAM)
ollama pull codellama:latest  # For code analysis

Configuration

Environment Variables

Create a .env file or set these variables:

# Ollama Configuration
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.2:latest
OLLAMA_WHISPER_MODEL=whisper:latest

# Worker Configuration
OLLAMA_POLL_INTERVAL_SECONDS=120
OLLAMA_MAX_VIDEOS_PER_CYCLE=1
OLLAMA_BACKOFF_SECONDS=300

Model Selection

llama3.2:latest - Good balance of speed and quality
llama3.2:13b - Better quality, needs more VRAM
llama3.2:70b - Best quality, needs 40GB+ VRAM
whisper:latest - For transcription (works well on GPU)

Running the Worker

Option 1: Windows Batch File

# Double-click or run:
start-ollama-worker.bat

Option 2: Manual Start

# Activate virtual environment
aienv\Scripts\activate

# Set environment variables
set OLLAMA_POLL_INTERVAL_SECONDS=120
set OLLAMA_MAX_VIDEOS_PER_CYCLE=1
set OLLAMA_BASE_URL=http://localhost:11434
set OLLAMA_MODEL=llama3.2:latest
set OLLAMA_WHISPER_MODEL=whisper:latest

# Start worker
python worker\ollama_daemon.py

Option 3: Docker with Ollama

# Add to your Dockerfile
RUN curl -fsSL https://ollama.ai/install.sh | sh
RUN ollama pull llama3.2:latest
RUN ollama pull whisper:latest

Features

1. Local GPU Processing

Uses your local GPU for all AI tasks
No cloud API calls for transcription/analysis
Reduces quota usage and costs

2. Video Processing Pipeline

Transcription: Ollama Whisper model
Summarization: Ollama LLM model
Enhanced Analysis: Topics, sentiment, insights
PDF Generation: Local report creation
S3 Upload: Cloud storage for PDFs

3. Error Handling

Health checks for Ollama service
Automatic backoff on errors
Graceful fallbacks
Comprehensive logging

4. Configuration Options

Adjustable poll intervals
Max videos per cycle
Model selection
Timeout settings

Monitoring

Logs

Worker logs: ollama_worker.log
Console output for real-time monitoring
Error tracking and backoff notifications

Health Checks

Ollama service availability
Model loading status
Processing success rates

Troubleshooting

Common Issues

Ollama not running

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Start Ollama
ollama serve

Model not found

# List available models
ollama list

# Pull missing model
ollama pull llama3.2:latest

GPU memory issues
- Use smaller models (llama3.2:latest instead of 13b/70b)
- Reduce batch sizes
- Check GPU memory usage
Slow processing
- Ensure GPU is being used (check Ollama logs)
- Use faster models
- Increase timeout values

Performance Tuning

GPU Optimization

# Set GPU memory fraction
export CUDA_VISIBLE_DEVICES=0

# Use specific GPU
ollama serve --gpu 0

Model Optimization
- Use quantized models for faster inference
- Adjust temperature and top_p parameters
- Monitor VRAM usage
Worker Configuration
- Increase OLLAMA_MAX_VIDEOS_PER_CYCLE for batch processing
- Decrease OLLAMA_POLL_INTERVAL_SECONDS for faster processing
- Adjust timeout values based on your hardware

Integration

With Existing System

Runs alongside the cloud worker
Processes the same pending videos
Uses the same database and S3 storage
Can be used as primary or backup worker

Switching Between Workers

Cloud-only: Set WORKER_MAX_VIDEOS_PER_CYCLE=0 in cloud worker
Local-only: Stop cloud worker, run Ollama worker
Hybrid: Run both with different priorities

Cost Benefits

No cloud API costs for transcription/analysis
Uses local GPU resources
Reduces quota usage on cloud platforms
Better privacy (data stays local)

Security

All processing happens locally
No data sent to external APIs (except S3 for storage)
Full control over models and data
Can run completely offline (except for S3 uploads)