Unlimited Vocabulary Script Recognition Using Character N-Grams
Mustererkennung 2000, 22. DAGM-Symposium
Dictionary Preselection in a Neuro-Markovian Word Recognition System
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
A Full English Sentence Database for Off-Line Handwriting Recognition
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Combining Trigram-based and feature-based methods for context-sensitive spelling correction
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Improving OCR accuracy for classical critical editions
ECDL'09 Proceedings of the 13th European conference on Research and advanced technology for digital libraries
A post-processing scheme for malayalam using statistical sub-character language models
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Hi-index | 0.00 |
This paper proposes an OCR post-processing approach based on multi-knowledge, which integrates language knowledge and candidate distance information given by the OCR engine. In this approach, statistical language model and semantic lexicon are combined, and candidate distance information is used to reduce the size of the search space. The experimental results show that this approach is very effective. After post-processing, the recognition accuracy rate on the test set increases from 58.45% to 83.73%, which means 60.84% error reduction.