Prototype Extraction and Adaptive OCR

Authors:
Yihong Xu;George Nagy
Affiliations:
Hewlett-Packard Lab., Palo Alto, CA;Rensselaer Polytechnic Institute, Troy, NY
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
1999

Citing 7
Cited 14

Finding string distances

Dr. Dobb's Journal
Bayesian subsequence matching and segmentation

Pattern Recognition Letters - special issue on pattern recognition in practice V
Information Retrieval: Computational and Theoretical Aspects

Information Retrieval: Computational and Theoretical Aspects
Managing Gigabytes: Compressing and Indexing Documents and Images

Managing Gigabytes: Compressing and Indexing Documents and Images
Document Image Decoding Using Markov Source Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Prototype Extracion for Adaptive OCR

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
An OCR based on character shape codes and lexical information

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2

A multilingual, multimodal digital video library system

Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Handwriting Recognition Using Position Sensitive Letter N-Gram Matching

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Training on Severely Degraded Text-Line Images

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Video text recognition using feature compensation as category-dependent feature extraction

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Print keyword spotting with dynamically synthesized pseudo 2D HMMs

Pattern Recognition Letters
Style Consistent Classification of Isogenous Patterns

IEEE Transactions on Pattern Analysis and Machine Intelligence
A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Morphological preprocessing method to thresholding degraded word images

Pattern Recognition Letters
Tools for monitoring, visualizing, and refining collections of noisy documents

Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
Self adaptable recognizer for document image collections

PReMI'07 Proceedings of the 2nd international conference on Pattern recognition and machine intelligence
Multi-character field recognition for Arabic and Chinese handwriting

SACH'06 Proceedings of the 2006 conference on Arabic and Chinese handwriting recognition
Unsupervised font reconstruction based on token co-occurrence

Proceedings of the 10th ACM symposium on Document engineering
Aligning transcripts to automatically segmented handwritten manuscripts

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Optical character recognition: A comprehensive study of hybrid methods

International Journal of Knowledge-based and Intelligent Engineering Systems

Quantified Score

Hi-index	0.14

Visualization

Abstract

To maintain OCR accuracy with decreasing quality of page image composition, production, and digitization, it is essential to tune the system to each document. We propose a prototype extraction method for document-specific OCR systems. The method automatically generates training samples from unsegmented text images and the corresponding transcripts. It is tolerant of transcription errors, so a transcript produced automatically by an imperfect omnifont OCR system can be used. The method is based on new algorithms for estimating character widths, character locations in a word, and match/nonmatch probabilities from unsegmented text. An experimental word recognition system is designed and developed to combine prototype extraction algorithms and segmentation-free word recognition. The system can adapt itself to different page images and achieve high recognition accuracy on heavily degraded print.