A technique for computer detection and correction of spelling errors
Communications of the ACM
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Error correction in a Chinese OCR test collection
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
An Information-Theoretic Definition of Similarity
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A Comparison of Standard Spell Checking Algorithms and a Novel Binary Neural Approach
IEEE Transactions on Knowledge and Data Engineering
Combining trigram and Winnow in thai OCR error correction
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Japanese OCR error correction using character shape similarity and statistical language model
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Context-based spelling correction for Japanese OCR
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Speech and Language Processing (2nd Edition)
Speech and Language Processing (2nd Edition)
OCR error correction using a noisy channel model
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Effect of OCR error correction on Arabic retrieval
Information Retrieval
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Real-word spelling correction using Google Web IT 3-grams
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Statistical Machine Translation
Statistical Machine Translation
Hi-index | 0.00 |
OCR (Optical Character Recognition) scanners do not always produce 100% accuracy in recognizing text documents, leading to spelling errors that make the texts hard to process further. This paper presents an investigation for the task of spell checking for OCR-scanned text documents. First, we conduct a detailed analysis on characteristics of spelling errors given by an OCR scanner. Then, we propose a fully automatic approach combining both error detection and correction phases within a unique scheme. The scheme is designed in an unsupervised & data-driven manner, suitable for resource-poor languages. Based on the evaluation on real dataset in Vietnamese language, our approach gives an acceptable performance (detection accuracy 86%, correction accuracy 71%). In addition, we also give a result analysis to show how accurate our approach can achieve.