🚀 MonkeyOCR-MLX: Apple Silicon Optimized OCR

A high-performance OCR application optimized for Apple Silicon with MLX-VLM acceleration, featuring advanced document layout analysis and intelligent text extraction.

🔥 Key Features

⚡ MLX-VLM Optimization: Native Apple Silicon acceleration using MLX framework
🚀 3x Faster Processing: Compared to standard PyTorch on M-series chips
🧠 Advanced AI: Powered by Qwen2.5-VL model with specialized layout analysis
📄 Multi-format Support: PDF, PNG, JPG, JPEG with intelligent structure detection
🌐 Modern Web Interface: Beautiful Gradio interface for easy document processing
🔄 Batch Processing: Efficient handling of multiple documents
🎯 High Accuracy: Specialized for complex financial documents and tables
🔒 100% Private: All processing happens locally on your Mac

📊 Performance Benchmarks

Test: Complex Financial Document (Tax Form)

MLX-VLM: ~15-18 seconds ⚡
Standard PyTorch: ~25-30 seconds
CPU Only: ~60-90 seconds

MacBook M4 Pro Performance:

Model loading: ~1.7s
Text extraction: ~15s
Table structure: ~18s
Memory usage: ~13GB peak

🛠 Installation

Prerequisites

macOS with Apple Silicon (M1/M2/M3/M4)
Python 3.11+
16GB+ RAM (32GB+ recommended for large documents)

Quick Setup

Clone the repository:

git clone https://huggingface.co/Jimmi42/MonkeyOCR-Apple-Silicon
cd MonkeyOCR-Apple-Silicon

Install with UV (Recommended):

# Install UV if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies (includes mlx-vlm)
uv sync

Or install with pip:
```
pip install -r requirements.txt
```
Download models (first run will download automatically):
```
cd MonkeyOCR
python download_model.py
```

🏃‍♂️ Usage

Web Interface (Recommended)

# Activate virtual environment
source .venv/bin/activate  # or `uv shell`

# Start the web app
python app.py

Access the interface at http://localhost:7861

Command Line

python main.py path/to/document.pdf

⚙️ Configuration

MLX-VLM Optimization (Default)

The app automatically detects Apple Silicon and uses MLX-VLM for optimal performance:

# model_configs_mps.yaml
device: mps
chat_config:
  backend: mlx  # MLX-VLM for maximum performance
  batch_size: 1
  max_new_tokens: 256
  temperature: 0.0

Performance Backends

Backend	Speed	Memory	Best For
`mlx`	🚀🚀🚀	🟢	Apple Silicon (Recommended)
`transformers`	🚀🚀	🟡	Fallback option
`lmdeploy`	🚀	🔴	CUDA systems

🧠 Model Architecture

Core Components

Layout Detection: DocLayout-YOLO for document structure analysis
Vision-Language Model: Qwen2.5-VL with MLX optimization
Layout Reading: LayoutReader for reading order optimization
MLX Framework: Native Apple Silicon acceleration

Apple Silicon Optimizations

Metal Performance Shaders: Direct GPU acceleration
Unified Memory: Optimized memory access patterns
Neural Engine: Utilizes Apple's dedicated AI hardware
Float16 Precision: Optimal speed/accuracy balance

🎯 Perfect For

Document Types:

📊 Financial Documents: Tax forms, invoices, statements
📋 Legal Documents: Contracts, forms, certificates
📄 Academic Papers: Research papers, articles
🏢 Business Documents: Reports, presentations, spreadsheets

Advanced Features:

✅ Complex table extraction with highlighted cells
✅ Multi-column layouts and mixed content
✅ Mathematical formulas and equations
✅ Structured data output (Markdown, JSON)
✅ Batch processing for multiple files

🚨 Troubleshooting

MLX-VLM Issues

# Test MLX-VLM availability
python -c "import mlx_vlm; print('✅ MLX-VLM available')"

# Check if MLX backend is active
python -c "
import yaml
with open('model_configs_mps.yaml') as f:
    config = yaml.safe_load(f)
print(f'Backend: {config[\"chat_config\"][\"backend\"]}')
"

Performance Issues

# Check MPS availability
python -c "import torch; print(f'MPS available: {torch.backends.mps.is_available()}')"

# Monitor memory usage during processing
top -pid $(pgrep -f "python app.py")

Common Solutions

Slow Performance:
- Ensure MLX backend is set to mlx in config
- Check that mlx-vlm is installed: pip install mlx-vlm
Memory Issues:
- Reduce image resolution before processing
- Close other memory-intensive applications
- Reduce batch_size to 1 in config
Port Already in Use:
```
GRADIO_SERVER_PORT=7862 python app.py
```

📁 Project Structure

MonkeyOCR-MLX/
├── 🌐 app.py                    # Gradio web interface
├── 🖥️ main.py                   # CLI interface  
├── ⚙️ model_configs_mps.yaml    # MLX-optimized config
├── 📦 requirements.txt          # Dependencies (includes mlx-vlm)
├── 🛠️ torch_patch.py           # Compatibility patches
├── 🧠 MonkeyOCR/               # Core AI models
│   └── 🎯 magic_pdf/           # Processing engine
├── 📄 .gitignore               # Git ignore rules
└── 📚 README.md                # This file

🔥 What's New in MLX Version

✨ MLX-VLM Integration: Native Apple Silicon acceleration
🚀 3x Faster Processing: Compared to standard PyTorch
💾 Better Memory Efficiency: Optimized for unified memory
🎯 Improved Accuracy: Enhanced table and structure detection
🔧 Auto-Backend Selection: Intelligently chooses best backend
📊 Performance Monitoring: Built-in timing and metrics

🔬 Technical Implementation

MLX-VLM Backend (`MonkeyChat_MLX`)

Direct MLX framework integration
Optimized for Apple's Metal Performance Shaders
Native unified memory management
Specialized prompt processing for OCR tasks

Fallback Mechanisms

Automatic detection of MLX-VLM availability
Graceful fallback to PyTorch transformers
Cross-platform compatibility maintained

🤝 Contributing

We welcome contributions! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Apple MLX Team: For the incredible MLX framework
MonkeyOCR Team: For the foundational OCR model
Qwen Team: For the excellent Qwen2.5-VL model
Gradio Team: For the beautiful web interface
MLX-VLM Contributors: For the MLX vision-language integration

📞 Support

🐛 Bug Reports: Create an issue
💬 Discussions: Hugging Face Discussions
📖 Documentation: Check the troubleshooting section above
⭐ Star the repository if you find it useful!

🚀 Supercharged for Apple Silicon • Made with ❤️ for the MLX Community

Experience the future of OCR with native Apple Silicon optimization