Spaces:
Runtime error
Runtime error
File size: 3,861 Bytes
383af88 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
# DOLPHIN PDF Document AI - HuggingFace Spaces App
A Gradio-based web application for processing PDF documents using the DOLPHIN vision-language model. This app converts PDF files to images and processes them page by page to extract text, tables, and figures.
## Features
- **PDF Upload**: Upload PDF documents directly through the web interface
- **Page-by-Page Processing**: Converts PDF pages to high-quality images and processes each individually
- **Document Parsing**: Extracts text, tables, and figures using the DOLPHIN model
- **Markdown Output**: Generates clean markdown with embedded images and tables
- **Memory Optimized**: Designed for NVIDIA T4 GPU deployment on HuggingFace Spaces
- **Progress Tracking**: Real-time progress updates during processing
## Files
- `gradio_pdf_app.py` - Main Gradio application with PDF processing functionality
- `app.py` - HuggingFace Spaces entry point
- `requirements_hf_spaces.txt` - Dependencies optimized for HF Spaces deployment
## Usage
### Local Development
```bash
# Install dependencies
pip install -r requirements_hf_spaces.txt
# Run the app
python gradio_pdf_app.py
```
### HuggingFace Spaces Deployment
1. Create a new HuggingFace Space with Gradio SDK
2. Upload the following files:
- `app.py`
- `gradio_pdf_app.py`
- `utils/` (directory with utility functions)
- `requirements_hf_spaces.txt` (rename to `requirements.txt`)
3. Configure the Space:
- **SDK**: Gradio
- **Hardware**: NVIDIA T4 Small (recommended)
- **Python Version**: 3.9+
## Technical Details
### Memory Optimizations
- Uses `torch.float16` for GPU inference
- Smaller batch sizes (4) for element processing
- Memory cleanup with `torch.cuda.empty_cache()`
- Reduced max sequence length (2048) for generation
### PDF Processing Pipeline
1. **PDF to Images**: Uses PyMuPDF with 2x zoom for quality
2. **Layout Analysis**: DOLPHIN model parses document structure
3. **Element Extraction**: Processes text, tables, and figures separately
4. **Markdown Generation**: Converts results to formatted markdown
5. **Gallery View**: Creates overview of all processed pages
### Model Integration
- Uses HuggingFace transformers implementation
- Loads model with `device_map="auto"` for GPU optimization
- Batch processing for improved efficiency
- Graceful fallback to CPU if GPU unavailable
## Configuration
The app automatically detects and uses the DOLPHIN model:
- Local path: `./hf_model`
- HuggingFace Hub: `ByteDance/DOLPHIN`
## Dependencies
Core requirements:
- `torch>=2.1.0` - PyTorch for model inference
- `transformers>=4.47.0` - HuggingFace model loading
- `gradio>=5.36.0` - Web interface
- `pymupdf>=1.26.0` - PDF processing
- `pillow>=9.3.0` - Image processing
- `opencv-python-headless>=4.8.0` - Computer vision operations
## Error Handling
- Graceful handling of PDF conversion failures
- Memory management for large documents
- Progress reporting for long-running operations
- Fallback markdown generation if converter fails
## Performance Notes
- Optimized for NVIDIA T4 with 16GB VRAM
- Processing time: ~30-60 seconds per page (depends on complexity)
- Memory usage: ~8-12GB VRAM for typical documents
- CPU fallback available but significantly slower
## Example Output
The app generates:
1. **Markdown Preview**: Rendered document with LaTeX support
2. **Raw Markdown**: Source text for copying/editing
3. **Page Gallery**: Visual overview of all processed pages
4. **JSON Details**: Technical processing information
## Troubleshooting
- **Out of Memory**: Reduce batch size or use CPU
- **PDF Conversion Failed**: Check PDF format compatibility
- **Model Loading Error**: Verify model path and permissions
- **Slow Processing**: Ensure GPU is available and configured
## Credits
Built on the DOLPHIN model by ByteDance. Optimized for HuggingFace Spaces deployment. |