Medical Prescription OCR

Model Description

This model is a fine-tuned version of naver-clova-ix/donut-base specifically trained to recognize handwritten medical prescriptions. It combines state-of-the-art OCR capabilities with domain-specific training to accurately extract text from doctor's handwritten notes.

Key Features

  • Specialized for medical prescription text extraction
  • Handles various handwriting styles
  • Trained with gradual augmentation strategy for robustness
  • Includes integrated classification to identify prescription documents

Intended Use

This model is designed for:

  • Research in medical document digitization
  • Educational projects in healthcare technology
  • Proof-of-concept applications for prescription processing

Important: This model is NOT validated for clinical use and should not be used for actual medical diagnosis or prescription verification.

Training Details

Architecture

  • Base Model: NAVER Clova's Donut (Document Understanding Transformer)
  • Type: Vision Encoder-Decoder model
  • Framework: PyTorch Lightning

Training Strategy

The model was trained using a gradual augmentation approach:

  1. Early epochs: Basic augmentations (slight rotations, brightness adjustments)
  2. Later epochs: Advanced augmentations (perspective transforms, shadows, motion blur)

This strategy helps the model learn clean patterns first, then adapt to more challenging variations.

Performance Metrics

  • Character-level Accuracy: 71%
  • Word-level Accuracy: 84%

How to Use

from transformers import DonutProcessor, VisionEncoderDecoderModel
from PIL import Image
import torch

# Load the model and processor
processor = DonutProcessor.from_pretrained("chinmays18/medical-prescription-ocr")
model = VisionEncoderDecoderModel.from_pretrained("chinmays18/medical-prescription-ocr")

# Move to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# Process an image
image = Image.open("prescription.jpg").convert("RGB")
pixel_values = processor(images=image, return_tensors="pt").pixel_values.to(device)

# Generate text
task_prompt = "<s_ocr>"
decoder_input_ids = processor.tokenizer(task_prompt, return_tensors="pt").input_ids.to(device)

generated_ids = model.generate(
    pixel_values,
    decoder_input_ids=decoder_input_ids,
    max_length=512,
    num_beams=1,
    early_stopping=True
)

# Decode the generated text
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)

Links

Limitations and Biases

  1. Language: Primarily trained on English prescriptions
  2. Handwriting Styles: Performance varies with handwriting quality
  3. Medical Terminology: May struggle with rare drug names or abbreviations
  4. Image Quality: Best results with clear, well-lit images

Ethical Considerations

  • This model should not be used for actual medical diagnosis or prescription verification
  • Always have medical professionals verify any extracted prescription information
  • Be aware of patient privacy when processing medical documents

Citation

If you use this model in your research, please cite:

@misc{shrivastava2024medicalocr,
  author = {Chinmay Shrivastava},
  title = {Medical Prescription OCR},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  url = {https://huggingface.co/chinmays18/medical-prescription-ocr}
}

Acknowledgments

Base architecture: NAVER Clova AI's Donut team Training framework: PyTorch Lightning Dataset inspiration: IAM Handwriting Database

3. Update the Tags

Make sure to also update the datasets field from custom-iam-medical to chinmays18/medical-prescription-dataset to properly link to your dataset.

Downloads last month
5
Safetensors
Model size
202M params
Tensor type
I64
ยท
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for chinmays18/medical-prescription-ocr

Finetuned
(438)
this model

Dataset used to train chinmays18/medical-prescription-ocr