--- tags: - ocr - image-to-text license: mit library_name: transformers --- # Model Card: PHOCR an open high-performance Optical Character Recognition (OCR) toolkit [PHOCR](https://github.com/puhuilab/phocr). # PHOCR: High-Performance OCR Toolkit [English](README.md) | [简体中文](README_CN.md) PHOCR is an open high-performance Optical Character Recognition (OCR) toolkit designed for efficient text recognition across multiple languages including Chinese, Japanese, Korean, Russian, Vietnamese, and Thai. **PHOCR features a completely custom-developed recognition model (PH-OCRv1) that significantly outperforms existing solutions.** ## Motivation Current token-prediction-based model architectures are highly sensitive to the accuracy of contextual tokens. Repetitive patterns, even as few as a thousand instances, can lead to persistent memorization by the model. While most open-source text recognition models currently achieve character error rates (CER) in the percent range, our goal is to push this further into the per-mille range. At that level, for a system processing 100 million characters, the total number of recognition errors would be reduced to under 1 million — an order of magnitude improvement. ## Features - **Custom Recognition Model**: **PH-OCRv1** achieves sub-0.x% character error rate in document-style settings by leveraging open-source models. Even achieves 0.0x% character error rate in English. - **Multi-language Support**: Chinese, English, Japanese, Korean, Russian, and more - **Rich Vocabulary**: Comprehensive vocabulary for each language. Chinese: 15,316, Korean: 17,388, Japanese: 11,186, Russian: 292. - **High Performance**: Optimized inference engine with ONNX Runtime support - **Easy Integration**: Simple Python API for quick deployment - **Cross-platform**: Support for CPU and CUDA ## Visualization ## Installation ```bash # Choose **one** installation method below: # Method 1: Install with ONNX Runtime CPU version pip install phocr[cpu] # Method 2: Install with ONNX Runtime GPU version pip install phocr[cuda] # Required: Make sure the CUDA toolkit and cuDNN library are properly installed # You can install cuda runtime and cuDNN via conda: conda install -c nvidia cuda-runtime=12.1 cudnn=9 # Or manually install the corresponding CUDA toolkit and cuDNN libraries # Method 3: Manually manage ONNX Runtime # You can install `onnxruntime` or `onnxruntime-gpu` yourself, then install PHOCR pip install phocr ``` ## Quick Start ```python from phocr import PHOCR # Initialize OCR engine engine = PHOCR() # Perform OCR on image result = engine("path/to/image.jpg") print(result) # Visualize results result.vis("output.jpg") print(result.to_markdown()) ## only recognition ``` ## Benchmarks We conducted comprehensive benchmarks comparing PHOCR with leading OCR solutions across multiple languages and scenarios. **Our custom-developed PH-OCRv1 model demonstrates significant improvements over existing solutions.** ### Overall Performance Comparison
Model | ZH & EN CER ↓ |
JP CER ↓ |
KO CER ↓ |
RU CER ↓ |
|||||
---|---|---|---|---|---|---|---|---|---|
English | Simplified Chinese | EN CH Mixed | Traditional Chinese | Document | Scene | Document | Scene | Document | |
PHOCR | 0.0008 | 0.0057 | 0.0171 | 0.0145 | 0.0039 | 0.0197 | 0.0050 | 0.0255 | 0.0046 |
Baidu | 0.0014 | 0.0069 | 0.0354 | 0.0431 | 0.0222 | 0.0607 | 0.0238 | 0.212 | 0.0786 |
Ali | - | - | - | - | 0.0272 | 0.0564 | 0.0159 | 0.102 | 0.0616 |
PP-OCRv5 | 0.0149 | 0.0226 | 0.0722 | 0.0625 | 0.0490 | 0.1140 | 0.0113 | 0.0519 | 0.0348 |