metadata

title: Agno Document Analysis
emoji: 📄
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit

Agno Document Analysis Workflow

A sophisticated document processing application built with Agno v1.7.4 featuring a multi-agent workflow for intelligent document analysis and data extraction.

Features

5-Agent Workflow: Coordinator, Prompt Engineer, Data Extractor, Data Arranger, Code Generator
Multi-format Support: PDF, TXT, PNG, JPG, JPEG, DOCX, XLSX, CSV, MD, JSON, XML, HTML, PY, JS, TS, DOC, XLS, PPT, PPTX
Real-time Processing: Streaming interface with live updates
Sandboxed Execution: Safe code execution environment
Beautiful UI: Modern Gradio interface with custom animations

Quick Start

Automated Installation

# Clone the repository
git clone <repository-url>
cd Data_Extractor

# Quick installation (recommended)
./install.sh

# Or use Python setup script
python setup.py

Manual Installation

# Create virtual environment
python -m venv data_extractor_env
source data_extractor_env/bin/activate  # On Windows: data_extractor_env\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Create environment file
cp .env.example .env  # Update with your API keys

# Run the application
python app.py

Installation Options

Requirements Files

requirements-minimal.txt: Essential dependencies only (~50 packages)
```
pip install -r requirements-minimal.txt
```
requirements.txt: Complete feature set (~200+ packages)
```
pip install -r requirements.txt
```
requirements-dev.txt: Development dependencies with testing tools
```
pip install -r requirements-dev.txt
```

System Dependencies

Some features require system-level dependencies:

macOS:

brew install tesseract imagemagick poppler

Ubuntu/Debian:

sudo apt-get install tesseract-ocr libmagickwand-dev poppler-utils

Windows:

choco install tesseract imagemagick poppler

Usage

Setup Environment: Follow installation instructions above
Configure API Keys: Update .env file with your API keys
Upload Document: Support for 20+ file formats
Select Analysis: Choose from predefined types or custom prompts
Process: Watch the multi-agent workflow in real-time
Download Results: Get structured data and generated Excel reports

Environment Variables

Create a .env file with the following variables:

# Required API Keys
GOOGLE_API_KEY=your_google_api_key_here
OPENAI_API_KEY=your_openai_api_key_here  # Optional

# Application Settings
DEBUG=False
LOG_LEVEL=INFO
SESSION_TIMEOUT=3600

# File Processing
MAX_FILE_SIZE=50MB
SUPPORTED_FORMATS=pdf,docx,xlsx,txt

# Database (Optional)
DATABASE_URL=sqlite:///data_extractor.db

Advanced Features

Financial Document Processing

Comprehensive financial data extraction
13-category data organization
Excel report generation with charts
XBRL and SEC filing support

OCR and Image Processing

EasyOCR and PaddleOCR integration
Tesseract OCR support
Advanced image preprocessing

Machine Learning Integration

TensorFlow and PyTorch support
Scikit-learn for data analysis
XGBoost and LightGBM for predictions

Troubleshooting

For detailed troubleshooting and installation issues, see:

INSTALLATION.md - Comprehensive installation guide
FIXES_SUMMARY.md - Known issues and solutions

Common Issues

Import Errors: Try minimal installation first
OCR Issues: Install system dependencies
Memory Issues: Use smaller batch sizes
API Errors: Verify API keys in .env file

Docker Support

# Build and run with Docker
docker build -t data-extractor .
docker run -p 7860:7860 --env-file .env data-extractor