Heuristics: intelligent search strategies for computer problem solving
Heuristics: intelligent search strategies for computer problem solving
Generalized best-first search strategies and the optimality of A*
Journal of the ACM (JACM)
Correcting speech recognition errors
Correcting speech recognition errors
An improved error model for noisy channel spelling correction
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Adaptive OCR with Limited User Feedback
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Boosting performance of bio-entity recognition by combining results from multiple systems
Proceedings of the 5th international workshop on Bioinformatics
Error correction via a post-processor for continuous speech recognition
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Towards Whole-Book Recognition
DAS '08 Proceedings of the 2008 The Eighth IAPR International Workshop on Document Analysis Systems
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
An improved search algorithm for optimal multiple-sequence alignment
Journal of Artificial Intelligence Research
The generalized A* architecture
Journal of Artificial Intelligence Research
Improving OCR accuracy for classical critical editions
ECDL'09 Proceedings of the 13th European conference on Research and advanced technology for digital libraries
Evaluating models of latent document semantics in the presence of OCR errors
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Extracting person names from diverse and noisy OCR text
AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
Why multiple document image binarizations improve OCR
Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing
On handling textual errors in latent document modeling
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Individual optical character recognition (OCR) engines vary in the types of errors they commit in recognizing text, particularly poor quality text. By aligning the output of multiple OCR engines and taking advantage of the differences between them, the error rate based on the aligned lattice of recognized words is significantly lower than the individual OCR word error rates. This lattice error rate constitutes a lower bound among aligned alternatives from the OCR output. Results from a collection of poor quality mid-twentieth century typewritten documents demonstrate an average reduction of 55.0% in the error rate of the lattice of alternatives and a realized word error rate (WER) reduction of 35.8% in a dictionary-based selection process. As an important precursor, an innovative admissible heuristic for the A* algorithm is developed, which results in a significant reduction in state space exploration to identify all optimal alignments of the OCR text output, a necessary step toward the construction of the word hypothesis lattice. On average 0.0079% of the state space is explored to identify all optimal alignments of the documents.