Contextual word recognition using probabilistic relaxation labeling
Pattern Recognition
Visual text recognition through contextual processing
Pattern Recognition
A statistical approach to machine translation
Computational Linguistics
Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
The indexing and retrieval of document images: a survey
Computer Vision and Image Understanding - Special issue on document image understanding and retrieval
Integrating diverse knowledge sources in text recognition
ACM Transactions on Information Systems (TOIS)
A Rational Design for a Weighted Finite-State Transducer Library
WIA '97 Revised Papers from the Second International Workshop on Implementing Automata
Stochastic Error-Correcting Parsing for OCR Post-Processing
ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 4
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
Semi-automatic acquisition of domain-specific translation lexicons
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Improved cross-language retrieval using backoff translation
HLT '01 Proceedings of the first international conference on Human language technology research
An improved error model for noisy channel spelling correction
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Improved statistical alignment models
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
OCR error correction using a noisy channel model
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Adaptive Hindi OCR using generalized Hausdorff image comparison
ACM Transactions on Asian Language Information Processing (TALIP)
A filter based post-OCR accuracy boost system
Proceedings of the 1st ACM workshop on Hardcopy document processing
OCR post-processing for low density languages
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Phrase-based correction model for improving handwriting recognition accuracies
Pattern Recognition
A panlingual anomalous text detector
Proceedings of the 9th ACM symposium on Document engineering
Correction of medical handwriting OCR based on semantic similarity
IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Bayesian inference for finite-state transducers
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Bounding the probability of error for high precision optical character recognition
The Journal of Machine Learning Research
The Journal of Machine Learning Research
Unsupervised profiling of OCRed historical documents
Pattern Recognition
Why multiple document image binarizations improve OCR
Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing
Hi-index | 0.00 |
In this paper, we introduce a generative probabilistic optical character recognition (OCR) model that describes an end-to-end process in the noisy channel framework, progressing from generation of true text through its transformation into the noisy output of an OCR system. The model is designed for use in error correction, with a focus on post-processing the output of black-box OCR systems in order to make it more useful for NLP tasks. We present an implementation of the model based on finite-state models, demonstrate the model's ability to significantly reduce character and word error rate, and provide evaluation results involving automatic extraction of translation lexicons from printed text.