Model Card: PHOCR
an open high-performance Optical Character Recognition (OCR) toolkit PHOCR.
PHOCR: High-Performance OCR Toolkit
PHOCR is an open high-performance Optical Character Recognition (OCR) toolkit designed for efficient text recognition across multiple languages including Chinese, Japanese, Korean, Russian, Vietnamese, and Thai. PHOCR features a completely custom-developed recognition model (PH-OCRv1) that significantly outperforms existing solutions.
Motivation
Current token-prediction-based model architectures are highly sensitive to the accuracy of contextual tokens. Repetitive patterns, even as few as a thousand instances, can lead to persistent memorization by the model. While most open-source text recognition models currently achieve character error rates (CER) in the percent range, our goal is to push this further into the per-mille range. At that level, for a system processing 100 million characters, the total number of recognition errors would be reduced to under 1 million โ an order of magnitude improvement.
Features
- Custom Recognition Model: PH-OCRv1 achieves sub-0.x% character error rate in document-style settings by leveraging open-source models. Even achieves 0.0x% character error rate in English.
- Multi-language Support: Chinese, English, Japanese, Korean, Russian, and more
- Rich Vocabulary: Comprehensive vocabulary for each language. Chinese: 15,316, Korean: 17,388, Japanese: 11,186, Russian: 292.
- High Performance: Optimized inference engine with ONNX Runtime support
- Easy Integration: Simple Python API for quick deployment
- Cross-platform: Support for CPU and CUDA
Visualization
Installation
# Choose **one** installation method below:
# Method 1: Install with ONNX Runtime CPU version
pip install phocr[cpu]
# Method 2: Install with ONNX Runtime GPU version
pip install phocr[cuda]
# Required: Make sure the CUDA toolkit and cuDNN library are properly installed
# You can install cuda runtime and cuDNN via conda:
conda install -c nvidia cuda-runtime=12.1 cudnn=9
# Or manually install the corresponding CUDA toolkit and cuDNN libraries
# Method 3: Manually manage ONNX Runtime
# You can install `onnxruntime` or `onnxruntime-gpu` yourself, then install PHOCR
pip install phocr
Quick Start
from phocr import PHOCR
# Initialize OCR engine
engine = PHOCR()
# Perform OCR on image
result = engine("path/to/image.jpg")
print(result)
# Visualize results
result.vis("output.jpg")
print(result.to_markdown())
## only recognition
Benchmarks
We conducted comprehensive benchmarks comparing PHOCR with leading OCR solutions across multiple languages and scenarios. Our custom-developed PH-OCRv1 model demonstrates significant improvements over existing solutions.
Overall Performance Comparison
Model | ZH & EN CER โ |
JP CER โ |
KO CER โ |
RU CER โ |
|||||
---|---|---|---|---|---|---|---|---|---|
English | Simplified Chinese | EN CH Mixed | Traditional Chinese | Document | Scene | Document | Scene | Document | |
PHOCR | 0.0008 | 0.0057 | 0.0171 | 0.0145 | 0.0039 | 0.0197 | 0.0050 | 0.0255 | 0.0046 |
Baidu | 0.0014 | 0.0069 | 0.0354 | 0.0431 | 0.0222 | 0.0607 | 0.0238 | 0.212 | 0.0786 |
Ali | - | - | - | - | 0.0272 | 0.0564 | 0.0159 | 0.102 | 0.0616 |
PP-OCRv5 | 0.0149 | 0.0226 | 0.0722 | 0.0625 | 0.0490 | 0.1140 | 0.0113 | 0.0519 | 0.0348 |
Notice
- baidu: Baidu Accurate API
- Ali: Aliyun API
- CER: the total edit distance divided by the total number of characters in the ground truth.
Advanced Usage
With global KV cache enabled, we implement a simple version using PyTorch (CUDA). When running with torch (CUDA), you can enable caching by setting use_cache=True
in ORTSeq2Seq(...)
, which also allows for larger batch sizes.
Language-specific Configuration
See demo.py for more examples.
Evaluation & Benchmarking
PHOCR provides comprehensive benchmarking tools to evaluate model performance across different languages and scenarios.
Quick Benchmark
Run the complete benchmark pipeline:
sh benchmark/run_recognition.sh
Calculate Character Error Rate (CER) for model predictions:
sh benchmark/run_score.sh
Benchmark Datasets
PHOCR uses standardized benchmark datasets for fair comparison:
- zh_en_rec_bench Chinese & English mixed text recognition
- jp_rec_bench Japanese text recognition
- ko_rec_bench Korean text recognition
- ru_rec_bench Russian text recognition
Further Improvements
- Character error rate (CER), including punctuation, can be further reduced through additional normalization of the training corpus.
- Text detection accuracy can be further enhanced by employing a more advanced detection framework.
Contributing
We welcome contributions! Please feel free to submit issues, feature requests, or pull requests.
Support
For questions and support, please open an issue on GitHub or contact the maintainers.
Acknowledgements
Many thanks to RapidOCR for detection and main framework.
License
- This project is released under the Apache 2.0 license
- The copyright of the OCR detection and classification model is held by Baidu
- The PHOCR recognition models are under the modified MIT License - see the LICENSE file for details
Citation
If you use PHOCR in your research, please cite:
@misc{phocr2025,
title={PHOCR: High-Performance OCR Toolkit},
author={PuHui Lab},
year={2025},
url={https://github.com/puhuilab/phocr}
}
- Downloads last month
- 1