Prototype Extraction and Adaptive OCR

  • Authors:
  • Yihong Xu;George Nagy

  • Affiliations:
  • Hewlett-Packard Lab., Palo Alto, CA;Rensselaer Polytechnic Institute, Troy, NY

  • Venue:
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • Year:
  • 1999

Quantified Score

Hi-index 0.14

Visualization

Abstract

To maintain OCR accuracy with decreasing quality of page image composition, production, and digitization, it is essential to tune the system to each document. We propose a prototype extraction method for document-specific OCR systems. The method automatically generates training samples from unsegmented text images and the corresponding transcripts. It is tolerant of transcription errors, so a transcript produced automatically by an imperfect omnifont OCR system can be used. The method is based on new algorithms for estimating character widths, character locations in a word, and match/nonmatch probabilities from unsegmented text. An experimental word recognition system is designed and developed to combine prototype extraction algorithms and segmentation-free word recognition. The system can adapt itself to different page images and achieve high recognition accuracy on heavily degraded print.