Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available:
5.42.0
DOLPHIN PDF Document AI - HuggingFace Spaces App
A Gradio-based web application for processing PDF documents using the DOLPHIN vision-language model. This app converts PDF files to images and processes them page by page to extract text, tables, and figures.
Features
- PDF Upload: Upload PDF documents directly through the web interface
- Page-by-Page Processing: Converts PDF pages to high-quality images and processes each individually
- Document Parsing: Extracts text, tables, and figures using the DOLPHIN model
- Markdown Output: Generates clean markdown with embedded images and tables
- Memory Optimized: Designed for NVIDIA T4 GPU deployment on HuggingFace Spaces
- Progress Tracking: Real-time progress updates during processing
Files
gradio_pdf_app.py
- Main Gradio application with PDF processing functionalityapp.py
- HuggingFace Spaces entry pointrequirements_hf_spaces.txt
- Dependencies optimized for HF Spaces deployment
Usage
Local Development
# Install dependencies
pip install -r requirements_hf_spaces.txt
# Run the app
python gradio_pdf_app.py
HuggingFace Spaces Deployment
Create a new HuggingFace Space with Gradio SDK
Upload the following files:
app.py
gradio_pdf_app.py
utils/
(directory with utility functions)requirements_hf_spaces.txt
(rename torequirements.txt
)
Configure the Space:
- SDK: Gradio
- Hardware: NVIDIA T4 Small (recommended)
- Python Version: 3.9+
Technical Details
Memory Optimizations
- Uses
torch.float16
for GPU inference - Smaller batch sizes (4) for element processing
- Memory cleanup with
torch.cuda.empty_cache()
- Reduced max sequence length (2048) for generation
PDF Processing Pipeline
- PDF to Images: Uses PyMuPDF with 2x zoom for quality
- Layout Analysis: DOLPHIN model parses document structure
- Element Extraction: Processes text, tables, and figures separately
- Markdown Generation: Converts results to formatted markdown
- Gallery View: Creates overview of all processed pages
Model Integration
- Uses HuggingFace transformers implementation
- Loads model with
device_map="auto"
for GPU optimization - Batch processing for improved efficiency
- Graceful fallback to CPU if GPU unavailable
Configuration
The app automatically detects and uses the DOLPHIN model:
- Local path:
./hf_model
- HuggingFace Hub:
ByteDance/DOLPHIN
Dependencies
Core requirements:
torch>=2.1.0
- PyTorch for model inferencetransformers>=4.47.0
- HuggingFace model loadinggradio>=5.36.0
- Web interfacepymupdf>=1.26.0
- PDF processingpillow>=9.3.0
- Image processingopencv-python-headless>=4.8.0
- Computer vision operations
Error Handling
- Graceful handling of PDF conversion failures
- Memory management for large documents
- Progress reporting for long-running operations
- Fallback markdown generation if converter fails
Performance Notes
- Optimized for NVIDIA T4 with 16GB VRAM
- Processing time: ~30-60 seconds per page (depends on complexity)
- Memory usage: ~8-12GB VRAM for typical documents
- CPU fallback available but significantly slower
Example Output
The app generates:
- Markdown Preview: Rendered document with LaTeX support
- Raw Markdown: Source text for copying/editing
- Page Gallery: Visual overview of all processed pages
- JSON Details: Technical processing information
Troubleshooting
- Out of Memory: Reduce batch size or use CPU
- PDF Conversion Failed: Check PDF format compatibility
- Model Loading Error: Verify model path and permissions
- Slow Processing: Ensure GPU is available and configured
Credits
Built on the DOLPHIN model by ByteDance. Optimized for HuggingFace Spaces deployment.