Dr. Dobb's Journal
Bayesian subsequence matching and segmentation
Pattern Recognition Letters - special issue on pattern recognition in practice V
Information Retrieval: Computational and Theoretical Aspects
Information Retrieval: Computational and Theoretical Aspects
Managing Gigabytes: Compressing and Indexing Documents and Images
Managing Gigabytes: Compressing and Indexing Documents and Images
Document Image Decoding Using Markov Source Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Prototype Extracion for Adaptive OCR
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
An OCR based on character shape codes and lexical information
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
A multilingual, multimodal digital video library system
Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Handwriting Recognition Using Position Sensitive Letter N-Gram Matching
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Training on Severely Degraded Text-Line Images
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Video text recognition using feature compensation as category-dependent feature extraction
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Print keyword spotting with dynamically synthesized pseudo 2D HMMs
Pattern Recognition Letters
Style Consistent Classification of Isogenous Patterns
IEEE Transactions on Pattern Analysis and Machine Intelligence
A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Morphological preprocessing method to thresholding degraded word images
Pattern Recognition Letters
Tools for monitoring, visualizing, and refining collections of noisy documents
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
Self adaptable recognizer for document image collections
PReMI'07 Proceedings of the 2nd international conference on Pattern recognition and machine intelligence
Multi-character field recognition for Arabic and Chinese handwriting
SACH'06 Proceedings of the 2006 conference on Arabic and Chinese handwriting recognition
Unsupervised font reconstruction based on token co-occurrence
Proceedings of the 10th ACM symposium on Document engineering
Aligning transcripts to automatically segmented handwritten manuscripts
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Optical character recognition: A comprehensive study of hybrid methods
International Journal of Knowledge-based and Intelligent Engineering Systems
Hi-index | 0.14 |
To maintain OCR accuracy with decreasing quality of page image composition, production, and digitization, it is essential to tune the system to each document. We propose a prototype extraction method for document-specific OCR systems. The method automatically generates training samples from unsegmented text images and the corresponding transcripts. It is tolerant of transcription errors, so a transcript produced automatically by an imperfect omnifont OCR system can be used. The method is based on new algorithms for estimating character widths, character locations in a word, and match/nonmatch probabilities from unsegmented text. An experimental word recognition system is designed and developed to combine prototype extraction algorithms and segmentation-free word recognition. The system can adapt itself to different page images and achieve high recognition accuracy on heavily degraded print.