OCR error correction using a noisy channel model

Authors:
Okan Kolak;Philip Resnik
Affiliations:
University of Maryland, College Park, MD;University of Maryland, College Park, MD
Venue:
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Year:
2002

Citing 6
Cited 7

A statistical approach to machine translation

Computational Linguistics
Learning String-Edit Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Rule Acquisition for Spelling Correction

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
An improved error model for noisy channel spelling correction

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics

Adaptive Hindi OCR using generalized Hausdorff image comparison

ACM Transactions on Asian Language Information Processing (TALIP)
A generative probabilistic OCR model for NLP applications

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
An image-based automatic Arabic translation system

Pattern Recognition
A discriminative candidate generator for string transformations

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Generalized syntactic and semantic models of query reformulation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Using deep morphology to improve automatic error detection in Arabic handwriting recognition

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
An unsupervised and data-driven approach for spell checking in Vietnamese OCR-scanned texts

HYBRID '12 Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we take a pattern recognition approach to correcting errors in text generated from printed documents using optical character recognition (OCR). We apply a very general, theoretically optimal model to the problem of OCR word correction, introduce practical methods for parameter estimation, and evaluate performance on real data.