metadata
title: Agno Document Analysis
emoji: π
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
Agno Document Analysis Workflow
A sophisticated document processing application built with Agno v1.7.4 featuring a multi-agent workflow for intelligent document analysis and data extraction.
Features
- 5-Agent Workflow: Coordinator, Prompt Engineer, Data Extractor, Data Arranger, Code Generator
- Multi-format Support: PDF, TXT, PNG, JPG, JPEG, DOCX, XLSX, CSV, MD, JSON, XML, HTML, PY, JS, TS, DOC, XLS, PPT, PPTX
- Real-time Processing: Streaming interface with live updates
- Sandboxed Execution: Safe code execution environment
- Beautiful UI: Modern Gradio interface with custom animations
Quick Start
Automated Installation
# Clone the repository
git clone <repository-url>
cd Data_Extractor
# Quick installation (recommended)
./install.sh
# Or use Python setup script
python setup.py
Manual Installation
# Create virtual environment
python -m venv data_extractor_env
source data_extractor_env/bin/activate # On Windows: data_extractor_env\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Create environment file
cp .env.example .env # Update with your API keys
# Run the application
python app.py
Installation Options
Requirements Files
requirements-minimal.txt
: Essential dependencies only (~50 packages)pip install -r requirements-minimal.txt
requirements.txt
: Complete feature set (~200+ packages)pip install -r requirements.txt
requirements-dev.txt
: Development dependencies with testing toolspip install -r requirements-dev.txt
System Dependencies
Some features require system-level dependencies:
macOS:
brew install tesseract imagemagick poppler
Ubuntu/Debian:
sudo apt-get install tesseract-ocr libmagickwand-dev poppler-utils
Windows:
choco install tesseract imagemagick poppler
Usage
- Setup Environment: Follow installation instructions above
- Configure API Keys: Update
.env
file with your API keys - Upload Document: Support for 20+ file formats
- Select Analysis: Choose from predefined types or custom prompts
- Process: Watch the multi-agent workflow in real-time
- Download Results: Get structured data and generated Excel reports
Environment Variables
Create a .env
file with the following variables:
# Required API Keys
GOOGLE_API_KEY=your_google_api_key_here
OPENAI_API_KEY=your_openai_api_key_here # Optional
# Application Settings
DEBUG=False
LOG_LEVEL=INFO
SESSION_TIMEOUT=3600
# File Processing
MAX_FILE_SIZE=50MB
SUPPORTED_FORMATS=pdf,docx,xlsx,txt
# Database (Optional)
DATABASE_URL=sqlite:///data_extractor.db
Advanced Features
Financial Document Processing
- Comprehensive financial data extraction
- 13-category data organization
- Excel report generation with charts
- XBRL and SEC filing support
OCR and Image Processing
- EasyOCR and PaddleOCR integration
- Tesseract OCR support
- Advanced image preprocessing
Machine Learning Integration
- TensorFlow and PyTorch support
- Scikit-learn for data analysis
- XGBoost and LightGBM for predictions
Troubleshooting
For detailed troubleshooting and installation issues, see:
INSTALLATION.md
- Comprehensive installation guideFIXES_SUMMARY.md
- Known issues and solutions
Common Issues
- Import Errors: Try minimal installation first
- OCR Issues: Install system dependencies
- Memory Issues: Use smaller batch sizes
- API Errors: Verify API keys in
.env
file
Docker Support
# Build and run with Docker
docker build -t data-extractor .
docker run -p 7860:7860 --env-file .env data-extractor