--- tags: - ocr - image-to-text license: mit library_name: transformers --- # Model Card: PHOCR an open high-performance Optical Character Recognition (OCR) toolkit [PHOCR](https://github.com/puhuilab/phocr). # PHOCR: High-Performance OCR Toolkit [English](README.md) | [简体中文](README_CN.md) PHOCR is an open high-performance Optical Character Recognition (OCR) toolkit designed for efficient text recognition across multiple languages including Chinese, Japanese, Korean, Russian, Vietnamese, and Thai. **PHOCR features a completely custom-developed recognition model (PH-OCRv1) that significantly outperforms existing solutions.** ## Motivation Current token-prediction-based model architectures are highly sensitive to the accuracy of contextual tokens. Repetitive patterns, even as few as a thousand instances, can lead to persistent memorization by the model. While most open-source text recognition models currently achieve character error rates (CER) in the percent range, our goal is to push this further into the per-mille range. At that level, for a system processing 100 million characters, the total number of recognition errors would be reduced to under 1 million — an order of magnitude improvement. ## Features - **Custom Recognition Model**: **PH-OCRv1** achieves sub-0.x% character error rate in document-style settings by leveraging open-source models. Even achieves 0.0x% character error rate in English. - **Multi-language Support**: Chinese, English, Japanese, Korean, Russian, and more - **Rich Vocabulary**: Comprehensive vocabulary for each language. Chinese: 15,316, Korean: 17,388, Japanese: 11,186, Russian: 292. - **High Performance**: Optimized inference engine with ONNX Runtime support - **Easy Integration**: Simple Python API for quick deployment - **Cross-platform**: Support for CPU and CUDA ## Visualization ## Installation ```bash # Choose **one** installation method below: # Method 1: Install with ONNX Runtime CPU version pip install phocr[cpu] # Method 2: Install with ONNX Runtime GPU version pip install phocr[cuda] # Required: Make sure the CUDA toolkit and cuDNN library are properly installed # You can install cuda runtime and cuDNN via conda: conda install -c nvidia cuda-runtime=12.1 cudnn=9 # Or manually install the corresponding CUDA toolkit and cuDNN libraries # Method 3: Manually manage ONNX Runtime # You can install `onnxruntime` or `onnxruntime-gpu` yourself, then install PHOCR pip install phocr ``` ## Quick Start ```python from phocr import PHOCR # Initialize OCR engine engine = PHOCR() # Perform OCR on image result = engine("path/to/image.jpg") print(result) # Visualize results result.vis("output.jpg") print(result.to_markdown()) ## only recognition ``` ## Benchmarks We conducted comprehensive benchmarks comparing PHOCR with leading OCR solutions across multiple languages and scenarios. **Our custom-developed PH-OCRv1 model demonstrates significant improvements over existing solutions.** ### Overall Performance Comparison
Model ZH & EN
CER ↓
JP
CER ↓
KO
CER ↓
RU
CER ↓
English Simplified Chinese EN CH Mixed Traditional Chinese Document Scene Document Scene Document
PHOCR 0.0008 0.0057 0.0171 0.0145 0.0039 0.0197 0.0050 0.0255 0.0046
Baidu 0.0014 0.0069 0.0354 0.0431 0.0222 0.0607 0.0238 0.212 0.0786
Ali - - - - 0.0272 0.0564 0.0159 0.102 0.0616
PP-OCRv5 0.0149 0.0226 0.0722 0.0625 0.0490 0.1140 0.0113 0.0519 0.0348
Notice - baidu: [Baidu Accurate API](https://ai.baidu.com/tech/ocr/general) - Ali: [Aliyun API](https://help.aliyun.com/zh/ocr/product-overview/recognition-of-characters-in-languages-except-for-chinese-and-english-1) - CER: the total edit distance divided by the total number of characters in the ground truth. ## Advanced Usage With global KV cache enabled, we implement a simple version using PyTorch (CUDA). When running with torch (CUDA), you can enable caching by setting `use_cache=True` in `ORTSeq2Seq(...)`, which also allows for larger batch sizes. ### Language-specific Configuration See [demo.py](./demo.py) for more examples. ## Evaluation & Benchmarking PHOCR provides comprehensive benchmarking tools to evaluate model performance across different languages and scenarios. ### Quick Benchmark Run the complete benchmark pipeline: ```bash sh benchmark/run_recognition.sh ``` Calculate Character Error Rate (CER) for model predictions: ```bash sh benchmark/run_score.sh ``` ### Benchmark Datasets PHOCR uses standardized benchmark datasets for fair comparison: - **zh_en_rec_bench** [Chinese & English mixed text recognition](https://huggingface.co/datasets/puhuilab/zh_en_rec_bench) - **jp_rec_bench** [Japanese text recognition](https://huggingface.co/datasets/puhuilab/jp_rec_bench) - **ko_rec_bench** [Korean text recognition](https://huggingface.co/datasets/puhuilab/ko_rec_bench) - **ru_rec_bench** [Russian text recognition](https://huggingface.co/datasets/puhuilab/ru_rec_bench) ## Further Improvements - Character error rate (CER), including punctuation, can be further reduced through additional normalization of the training corpus. - Text detection accuracy can be further enhanced by employing a more advanced detection framework. ## Contributing We welcome contributions! Please feel free to submit issues, feature requests, or pull requests. ## Support For questions and support, please open an issue on GitHub or contact the maintainers. ## Acknowledgements Many thanks to [RapidOCR](https://github.com/RapidAI/RapidOCR) for detection and main framework. ## License - This project is released under the Apache 2.0 license - The copyright of the OCR detection and classification model is held by Baidu - The PHOCR recognition models are under the modified MIT License - see the [LICENSE](./LICENSE) file for details ## Citation If you use PHOCR in your research, please cite: ```bibtex @misc{phocr2025, title={PHOCR: High-Performance OCR Toolkit}, author={PuHui Lab}, year={2025}, url={https://github.com/puhuilab/phocr} } ```