Spaces:

raksama19
/

Test-Dolphin-PDF

Runtime error

App Files Files Community

Test-Dolphin-PDF / README_PDF_APP.md

raksa-the-wildcats

first commit

383af88 about 1 month ago

preview code

raw

history blame contribute delete

3.86 kB

	# DOLPHIN PDF Document AI - HuggingFace Spaces App

	A Gradio-based web application for processing PDF documents using the DOLPHIN vision-language model. This app converts PDF files to images and processes them page by page to extract text, tables, and figures.

	## Features

	- PDF Upload: Upload PDF documents directly through the web interface
	- Page-by-Page Processing: Converts PDF pages to high-quality images and processes each individually
	- Document Parsing: Extracts text, tables, and figures using the DOLPHIN model
	- Markdown Output: Generates clean markdown with embedded images and tables
	- Memory Optimized: Designed for NVIDIA T4 GPU deployment on HuggingFace Spaces
	- Progress Tracking: Real-time progress updates during processing

	## Files

	- `gradio_pdf_app.py` - Main Gradio application with PDF processing functionality
	- `app.py` - HuggingFace Spaces entry point
	- `requirements_hf_spaces.txt` - Dependencies optimized for HF Spaces deployment

	## Usage

	### Local Development

	```bash
	# Install dependencies
	pip install -r requirements_hf_spaces.txt

	# Run the app
	python gradio_pdf_app.py
	```

	### HuggingFace Spaces Deployment

	1. Create a new HuggingFace Space with Gradio SDK
	2. Upload the following files:
	- `app.py`
	- `gradio_pdf_app.py`
	- `utils/` (directory with utility functions)
	- `requirements_hf_spaces.txt` (rename to `requirements.txt`)

	3. Configure the Space:
	- SDK: Gradio
	- Hardware: NVIDIA T4 Small (recommended)
	- Python Version: 3.9+

	## Technical Details

	### Memory Optimizations

	- Uses `torch.float16` for GPU inference
	- Smaller batch sizes (4) for element processing
	- Memory cleanup with `torch.cuda.empty_cache()`
	- Reduced max sequence length (2048) for generation

	### PDF Processing Pipeline

	1. PDF to Images: Uses PyMuPDF with 2x zoom for quality
	2. Layout Analysis: DOLPHIN model parses document structure
	3. Element Extraction: Processes text, tables, and figures separately
	4. Markdown Generation: Converts results to formatted markdown
	5. Gallery View: Creates overview of all processed pages

	### Model Integration

	- Uses HuggingFace transformers implementation
	- Loads model with `device_map="auto"` for GPU optimization
	- Batch processing for improved efficiency
	- Graceful fallback to CPU if GPU unavailable

	## Configuration

	The app automatically detects and uses the DOLPHIN model:
	- Local path: `./hf_model`
	- HuggingFace Hub: `ByteDance/DOLPHIN`

	## Dependencies

	Core requirements:
	- `torch>=2.1.0` - PyTorch for model inference
	- `transformers>=4.47.0` - HuggingFace model loading
	- `gradio>=5.36.0` - Web interface
	- `pymupdf>=1.26.0` - PDF processing
	- `pillow>=9.3.0` - Image processing
	- `opencv-python-headless>=4.8.0` - Computer vision operations

	## Error Handling

	- Graceful handling of PDF conversion failures
	- Memory management for large documents
	- Progress reporting for long-running operations
	- Fallback markdown generation if converter fails

	## Performance Notes

	- Optimized for NVIDIA T4 with 16GB VRAM
	- Processing time: ~30-60 seconds per page (depends on complexity)
	- Memory usage: ~8-12GB VRAM for typical documents
	- CPU fallback available but significantly slower

	## Example Output

	The app generates:
	1. Markdown Preview: Rendered document with LaTeX support
	2. Raw Markdown: Source text for copying/editing
	3. Page Gallery: Visual overview of all processed pages
	4. JSON Details: Technical processing information

	## Troubleshooting

	- Out of Memory: Reduce batch size or use CPU
	- PDF Conversion Failed: Check PDF format compatibility
	- Model Loading Error: Verify model path and permissions
	- Slow Processing: Ensure GPU is available and configured

	## Credits

	Built on the DOLPHIN model by ByteDance. Optimized for HuggingFace Spaces deployment.