ocrqa-demo / README.md
maslionok
fist commit
1ee396e
|
raw
history blame
1.05 kB

OCR Quality Assessment Pipeline Demo

This demo showcases the OCR Quality Assessment Pipeline from the Impresso project, which analyzes and improves text extracted from OCR (Optical Character Recognition).

Features

  • OCR Error Detection: Identifies common OCR mistakes and artifacts
  • Quality Assessment: Evaluates the overall quality of OCR text
  • Text Correction: Suggests improvements for detected errors
  • Interactive Interface: User-friendly Gradio web interface

Usage

The demo accepts OCR text input and provides:

  • Quality assessment scores
  • Detected OCR errors
  • Suggested corrections
  • Processed/improved text

Example

Try the provided German text example that contains typical OCR errors like:

  • Character misrecognition (e.g., "Zaubrisch" instead of "Zauberisch")
  • Spacing issues (e.g., "nacb" instead of "nach")
  • Punctuation errors (e.g., "d:m" instead of "dem")

Installation

pip install -r requirements.txt
python app.py

The demo will be available at http://localhost:7860