πŸš€ MonkeyOCR-MLX: Apple Silicon Optimized OCR

A high-performance OCR application optimized for Apple Silicon with MLX-VLM acceleration, featuring advanced document layout analysis and intelligent text extraction.

πŸ”₯ Key Features

  • ⚑ MLX-VLM Optimization: Native Apple Silicon acceleration using MLX framework
  • πŸš€ 3x Faster Processing: Compared to standard PyTorch on M-series chips
  • 🧠 Advanced AI: Powered by Qwen2.5-VL model with specialized layout analysis
  • πŸ“„ Multi-format Support: PDF, PNG, JPG, JPEG with intelligent structure detection
  • 🌐 Modern Web Interface: Beautiful Gradio interface for easy document processing
  • πŸ”„ Batch Processing: Efficient handling of multiple documents
  • 🎯 High Accuracy: Specialized for complex financial documents and tables
  • πŸ”’ 100% Private: All processing happens locally on your Mac

πŸ“Š Performance Benchmarks

Test: Complex Financial Document (Tax Form)

  • MLX-VLM: ~15-18 seconds ⚑
  • Standard PyTorch: ~25-30 seconds
  • CPU Only: ~60-90 seconds

MacBook M4 Pro Performance:

  • Model loading: ~1.7s
  • Text extraction: ~15s
  • Table structure: ~18s
  • Memory usage: ~13GB peak

πŸ›  Installation

Prerequisites

  • macOS with Apple Silicon (M1/M2/M3/M4)
  • Python 3.11+
  • 16GB+ RAM (32GB+ recommended for large documents)

Quick Setup

  1. Clone the repository:

    git clone https://huggingface.co/Jimmi42/MonkeyOCR-Apple-Silicon
    cd MonkeyOCR-Apple-Silicon
    
  2. Install with UV (Recommended):

    # Install UV if not already installed
    curl -LsSf https://astral.sh/uv/install.sh | sh
    
    # Install dependencies (includes mlx-vlm)
    uv sync
    
  3. Or install with pip:

    pip install -r requirements.txt
    
  4. Download models (first run will download automatically):

    cd MonkeyOCR
    python download_model.py
    

πŸƒβ€β™‚οΈ Usage

Web Interface (Recommended)

# Activate virtual environment
source .venv/bin/activate  # or `uv shell`

# Start the web app
python app.py

Access the interface at http://localhost:7861

Command Line

python main.py path/to/document.pdf

βš™οΈ Configuration

MLX-VLM Optimization (Default)

The app automatically detects Apple Silicon and uses MLX-VLM for optimal performance:

# model_configs_mps.yaml
device: mps
chat_config:
  backend: mlx  # MLX-VLM for maximum performance
  batch_size: 1
  max_new_tokens: 256
  temperature: 0.0

Performance Backends

Backend Speed Memory Best For
mlx πŸš€πŸš€πŸš€ 🟒 Apple Silicon (Recommended)
transformers πŸš€πŸš€ 🟑 Fallback option
lmdeploy πŸš€ πŸ”΄ CUDA systems

🧠 Model Architecture

Core Components

  • Layout Detection: DocLayout-YOLO for document structure analysis
  • Vision-Language Model: Qwen2.5-VL with MLX optimization
  • Layout Reading: LayoutReader for reading order optimization
  • MLX Framework: Native Apple Silicon acceleration

Apple Silicon Optimizations

  • Metal Performance Shaders: Direct GPU acceleration
  • Unified Memory: Optimized memory access patterns
  • Neural Engine: Utilizes Apple's dedicated AI hardware
  • Float16 Precision: Optimal speed/accuracy balance

🎯 Perfect For

Document Types:

  • πŸ“Š Financial Documents: Tax forms, invoices, statements
  • πŸ“‹ Legal Documents: Contracts, forms, certificates
  • πŸ“„ Academic Papers: Research papers, articles
  • 🏒 Business Documents: Reports, presentations, spreadsheets

Advanced Features:

  • βœ… Complex table extraction with highlighted cells
  • βœ… Multi-column layouts and mixed content
  • βœ… Mathematical formulas and equations
  • βœ… Structured data output (Markdown, JSON)
  • βœ… Batch processing for multiple files

🚨 Troubleshooting

MLX-VLM Issues

# Test MLX-VLM availability
python -c "import mlx_vlm; print('βœ… MLX-VLM available')"

# Check if MLX backend is active
python -c "
import yaml
with open('model_configs_mps.yaml') as f:
    config = yaml.safe_load(f)
print(f'Backend: {config[\"chat_config\"][\"backend\"]}')
"

Performance Issues

# Check MPS availability
python -c "import torch; print(f'MPS available: {torch.backends.mps.is_available()}')"

# Monitor memory usage during processing
top -pid $(pgrep -f "python app.py")

Common Solutions

  1. Slow Performance:

    • Ensure MLX backend is set to mlx in config
    • Check that mlx-vlm is installed: pip install mlx-vlm
  2. Memory Issues:

    • Reduce image resolution before processing
    • Close other memory-intensive applications
    • Reduce batch_size to 1 in config
  3. Port Already in Use:

    GRADIO_SERVER_PORT=7862 python app.py
    

πŸ“ Project Structure

MonkeyOCR-MLX/
β”œβ”€β”€ 🌐 app.py                    # Gradio web interface
β”œβ”€β”€ πŸ–₯️ main.py                   # CLI interface  
β”œβ”€β”€ βš™οΈ model_configs_mps.yaml    # MLX-optimized config
β”œβ”€β”€ πŸ“¦ requirements.txt          # Dependencies (includes mlx-vlm)
β”œβ”€β”€ πŸ› οΈ torch_patch.py           # Compatibility patches
β”œβ”€β”€ 🧠 MonkeyOCR/               # Core AI models
β”‚   └── 🎯 magic_pdf/           # Processing engine
β”œβ”€β”€ πŸ“„ .gitignore               # Git ignore rules
└── πŸ“š README.md                # This file

πŸ”₯ What's New in MLX Version

  • ✨ MLX-VLM Integration: Native Apple Silicon acceleration
  • πŸš€ 3x Faster Processing: Compared to standard PyTorch
  • πŸ’Ύ Better Memory Efficiency: Optimized for unified memory
  • 🎯 Improved Accuracy: Enhanced table and structure detection
  • πŸ”§ Auto-Backend Selection: Intelligently chooses best backend
  • πŸ“Š Performance Monitoring: Built-in timing and metrics

πŸ”¬ Technical Implementation

MLX-VLM Backend (MonkeyChat_MLX)

  • Direct MLX framework integration
  • Optimized for Apple's Metal Performance Shaders
  • Native unified memory management
  • Specialized prompt processing for OCR tasks

Fallback Mechanisms

  • Automatic detection of MLX-VLM availability
  • Graceful fallback to PyTorch transformers
  • Cross-platform compatibility maintained

🀝 Contributing

We welcome contributions! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Apple MLX Team: For the incredible MLX framework
  • MonkeyOCR Team: For the foundational OCR model
  • Qwen Team: For the excellent Qwen2.5-VL model
  • Gradio Team: For the beautiful web interface
  • MLX-VLM Contributors: For the MLX vision-language integration

πŸ“ž Support

  • πŸ› Bug Reports: Create an issue
  • πŸ’¬ Discussions: Hugging Face Discussions
  • πŸ“– Documentation: Check the troubleshooting section above
  • ⭐ Star the repository if you find it useful!

πŸš€ Supercharged for Apple Silicon β€’ Made with ❀️ for the MLX Community

Experience the future of OCR with native Apple Silicon optimization

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support