π MonkeyOCR-MLX: Apple Silicon Optimized OCR
A high-performance OCR application optimized for Apple Silicon with MLX-VLM acceleration, featuring advanced document layout analysis and intelligent text extraction.
π₯ Key Features
- β‘ MLX-VLM Optimization: Native Apple Silicon acceleration using MLX framework
- π 3x Faster Processing: Compared to standard PyTorch on M-series chips
- π§ Advanced AI: Powered by Qwen2.5-VL model with specialized layout analysis
- π Multi-format Support: PDF, PNG, JPG, JPEG with intelligent structure detection
- π Modern Web Interface: Beautiful Gradio interface for easy document processing
- π Batch Processing: Efficient handling of multiple documents
- π― High Accuracy: Specialized for complex financial documents and tables
- π 100% Private: All processing happens locally on your Mac
π Performance Benchmarks
Test: Complex Financial Document (Tax Form)
- MLX-VLM: ~15-18 seconds β‘
- Standard PyTorch: ~25-30 seconds
- CPU Only: ~60-90 seconds
MacBook M4 Pro Performance:
- Model loading: ~1.7s
- Text extraction: ~15s
- Table structure: ~18s
- Memory usage: ~13GB peak
π Installation
Prerequisites
- macOS with Apple Silicon (M1/M2/M3/M4)
- Python 3.11+
- 16GB+ RAM (32GB+ recommended for large documents)
Quick Setup
Clone the repository:
git clone https://huggingface.co/Jimmi42/MonkeyOCR-Apple-Silicon cd MonkeyOCR-Apple-Silicon
Install with UV (Recommended):
# Install UV if not already installed curl -LsSf https://astral.sh/uv/install.sh | sh # Install dependencies (includes mlx-vlm) uv sync
Or install with pip:
pip install -r requirements.txt
Download models (first run will download automatically):
cd MonkeyOCR python download_model.py
πββοΈ Usage
Web Interface (Recommended)
# Activate virtual environment
source .venv/bin/activate # or `uv shell`
# Start the web app
python app.py
Access the interface at http://localhost:7861
Command Line
python main.py path/to/document.pdf
βοΈ Configuration
MLX-VLM Optimization (Default)
The app automatically detects Apple Silicon and uses MLX-VLM for optimal performance:
# model_configs_mps.yaml
device: mps
chat_config:
backend: mlx # MLX-VLM for maximum performance
batch_size: 1
max_new_tokens: 256
temperature: 0.0
Performance Backends
Backend | Speed | Memory | Best For |
---|---|---|---|
mlx |
πππ | π’ | Apple Silicon (Recommended) |
transformers |
ππ | π‘ | Fallback option |
lmdeploy |
π | π΄ | CUDA systems |
π§ Model Architecture
Core Components
- Layout Detection: DocLayout-YOLO for document structure analysis
- Vision-Language Model: Qwen2.5-VL with MLX optimization
- Layout Reading: LayoutReader for reading order optimization
- MLX Framework: Native Apple Silicon acceleration
Apple Silicon Optimizations
- Metal Performance Shaders: Direct GPU acceleration
- Unified Memory: Optimized memory access patterns
- Neural Engine: Utilizes Apple's dedicated AI hardware
- Float16 Precision: Optimal speed/accuracy balance
π― Perfect For
Document Types:
- π Financial Documents: Tax forms, invoices, statements
- π Legal Documents: Contracts, forms, certificates
- π Academic Papers: Research papers, articles
- π’ Business Documents: Reports, presentations, spreadsheets
Advanced Features:
- β Complex table extraction with highlighted cells
- β Multi-column layouts and mixed content
- β Mathematical formulas and equations
- β Structured data output (Markdown, JSON)
- β Batch processing for multiple files
π¨ Troubleshooting
MLX-VLM Issues
# Test MLX-VLM availability
python -c "import mlx_vlm; print('β
MLX-VLM available')"
# Check if MLX backend is active
python -c "
import yaml
with open('model_configs_mps.yaml') as f:
config = yaml.safe_load(f)
print(f'Backend: {config[\"chat_config\"][\"backend\"]}')
"
Performance Issues
# Check MPS availability
python -c "import torch; print(f'MPS available: {torch.backends.mps.is_available()}')"
# Monitor memory usage during processing
top -pid $(pgrep -f "python app.py")
Common Solutions
Slow Performance:
- Ensure MLX backend is set to
mlx
in config - Check that
mlx-vlm
is installed:pip install mlx-vlm
- Ensure MLX backend is set to
Memory Issues:
- Reduce image resolution before processing
- Close other memory-intensive applications
- Reduce batch_size to 1 in config
Port Already in Use:
GRADIO_SERVER_PORT=7862 python app.py
π Project Structure
MonkeyOCR-MLX/
βββ π app.py # Gradio web interface
βββ π₯οΈ main.py # CLI interface
βββ βοΈ model_configs_mps.yaml # MLX-optimized config
βββ π¦ requirements.txt # Dependencies (includes mlx-vlm)
βββ π οΈ torch_patch.py # Compatibility patches
βββ π§ MonkeyOCR/ # Core AI models
β βββ π― magic_pdf/ # Processing engine
βββ π .gitignore # Git ignore rules
βββ π README.md # This file
π₯ What's New in MLX Version
- β¨ MLX-VLM Integration: Native Apple Silicon acceleration
- π 3x Faster Processing: Compared to standard PyTorch
- πΎ Better Memory Efficiency: Optimized for unified memory
- π― Improved Accuracy: Enhanced table and structure detection
- π§ Auto-Backend Selection: Intelligently chooses best backend
- π Performance Monitoring: Built-in timing and metrics
π¬ Technical Implementation
MLX-VLM Backend (MonkeyChat_MLX
)
- Direct MLX framework integration
- Optimized for Apple's Metal Performance Shaders
- Native unified memory management
- Specialized prompt processing for OCR tasks
Fallback Mechanisms
- Automatic detection of MLX-VLM availability
- Graceful fallback to PyTorch transformers
- Cross-platform compatibility maintained
π€ Contributing
We welcome contributions! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit changes (
git commit -m 'Add amazing feature'
) - Push to branch (
git push origin feature/amazing-feature
) - Open a Pull Request
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
- Apple MLX Team: For the incredible MLX framework
- MonkeyOCR Team: For the foundational OCR model
- Qwen Team: For the excellent Qwen2.5-VL model
- Gradio Team: For the beautiful web interface
- MLX-VLM Contributors: For the MLX vision-language integration
π Support
- π Bug Reports: Create an issue
- π¬ Discussions: Hugging Face Discussions
- π Documentation: Check the troubleshooting section above
- β Star the repository if you find it useful!
π Supercharged for Apple Silicon β’ Made with β€οΈ for the MLX Community
Experience the future of OCR with native Apple Silicon optimization