Model Card: PHOCR

an open high-performance Optical Character Recognition (OCR) toolkit PHOCR.

PHOCR: High-Performance OCR Toolkit

PHOCR is an open high-performance Optical Character Recognition (OCR) toolkit designed for efficient text recognition across multiple languages including Chinese, Japanese, Korean, Russian, Vietnamese, and Thai. PHOCR features a completely custom-developed recognition model (PH-OCRv1) that significantly outperforms existing solutions.

Motivation

Current token-prediction-based model architectures are highly sensitive to the accuracy of contextual tokens. Repetitive patterns, even as few as a thousand instances, can lead to persistent memorization by the model. While most open-source text recognition models currently achieve character error rates (CER) in the percent range, our goal is to push this further into the per-mille range. At that level, for a system processing 100 million characters, the total number of recognition errors would be reduced to under 1 million — an order of magnitude improvement.

Features

Custom Recognition Model: PH-OCRv1 achieves sub-0.x% character error rate in document-style settings by leveraging open-source models. Even achieves 0.0x% character error rate in English.
Multi-language Support: Chinese, English, Japanese, Korean, Russian, and more
Rich Vocabulary: Comprehensive vocabulary for each language. Chinese: 15,316, Korean: 17,388, Japanese: 11,186, Russian: 292.
High Performance: Optimized inference engine with ONNX Runtime support
Easy Integration: Simple Python API for quick deployment
Cross-platform: Support for CPU and CUDA

Visualization

Installation

# Choose **one** installation method below:

# Method 1: Install with ONNX Runtime CPU version
pip install phocr[cpu]

# Method 2: Install with ONNX Runtime GPU version
pip install phocr[cuda]
# Required: Make sure the CUDA toolkit and cuDNN library are properly installed
# You can install cuda runtime and cuDNN via conda:
conda install -c nvidia cuda-runtime=12.1 cudnn=9
# Or manually install the corresponding CUDA toolkit and cuDNN libraries

# Method 3: Manually manage ONNX Runtime
# You can install `onnxruntime` or `onnxruntime-gpu` yourself, then install PHOCR
pip install phocr

Quick Start

from phocr import PHOCR

# Initialize OCR engine
engine = PHOCR()

# Perform OCR on image
result = engine("path/to/image.jpg")
print(result)

# Visualize results
result.vis("output.jpg")
print(result.to_markdown())

## only recognition

Benchmarks

We conducted comprehensive benchmarks comparing PHOCR with leading OCR solutions across multiple languages and scenarios. Our custom-developed PH-OCRv1 model demonstrates significant improvements over existing solutions.

Overall Performance Comparison

Model	ZH & EN CER ↓				JP CER ↓		KO CER ↓		RU CER ↓
Model	English	Simplified Chinese	EN CH Mixed	Traditional Chinese	Document	Scene	Document	Scene	Document
PHOCR	0.0008	0.0057	0.0171	0.0145	0.0039	0.0197	0.0050	0.0255	0.0046
Baidu	0.0014	0.0069	0.0354	0.0431	0.0222	0.0607	0.0238	0.212	0.0786
Ali	-	-	-	-	0.0272	0.0564	0.0159	0.102	0.0616
PP-OCRv5	0.0149	0.0226	0.0722	0.0625	0.0490	0.1140	0.0113	0.0519	0.0348

Notice

baidu: Baidu Accurate API
Ali: Aliyun API
CER: the total edit distance divided by the total number of characters in the ground truth.

Advanced Usage

With global KV cache enabled, we implement a simple version using PyTorch (CUDA). When running with torch (CUDA), you can enable caching by setting use_cache=True in ORTSeq2Seq(...), which also allows for larger batch sizes.

Language-specific Configuration

See demo.py for more examples.

Evaluation & Benchmarking

PHOCR provides comprehensive benchmarking tools to evaluate model performance across different languages and scenarios.

Quick Benchmark

Run the complete benchmark pipeline:

sh benchmark/run_recognition.sh

Calculate Character Error Rate (CER) for model predictions:

sh benchmark/run_score.sh

Benchmark Datasets

PHOCR uses standardized benchmark datasets for fair comparison:

zh_en_rec_bench Chinese & English mixed text recognition
jp_rec_bench Japanese text recognition
ko_rec_bench Korean text recognition
ru_rec_bench Russian text recognition

Further Improvements

Character error rate (CER), including punctuation, can be further reduced through additional normalization of the training corpus.
Text detection accuracy can be further enhanced by employing a more advanced detection framework.

Contributing

We welcome contributions! Please feel free to submit issues, feature requests, or pull requests.

Support

For questions and support, please open an issue on GitHub or contact the maintainers.

Acknowledgements

Many thanks to RapidOCR for detection and main framework.

License

This project is released under the Apache 2.0 license
The copyright of the OCR detection and classification model is held by Baidu
The PHOCR recognition models are under the modified MIT License - see the LICENSE file for details

Citation

If you use PHOCR in your research, please cite:

@misc{phocr2025,
  title={PHOCR: High-Performance OCR Toolkit},
  author={PuHui Lab},
  year={2025},
  url={https://github.com/puhuilab/phocr}
}