Spaces:
Running
Running
OCR Quality Assessment Pipeline Demo
This demo showcases the OCR Quality Assessment Pipeline from the Impresso project, which analyzes and improves text extracted from OCR (Optical Character Recognition).
Features
- OCR Error Detection: Identifies common OCR mistakes and artifacts
- Quality Assessment: Evaluates the overall quality of OCR text
- Text Correction: Suggests improvements for detected errors
- Interactive Interface: User-friendly Gradio web interface
Usage
The demo accepts OCR text input and provides:
- Quality assessment scores
- Detected OCR errors
- Suggested corrections
- Processed/improved text
Example
Try the provided German text example that contains typical OCR errors like:
- Character misrecognition (e.g., "Zaubrisch" instead of "Zauberisch")
- Spacing issues (e.g., "nacb" instead of "nach")
- Punctuation errors (e.g., "d:m" instead of "dem")
Installation
pip install -r requirements.txt
python app.py
The demo will be available at http://localhost:7860