An OCR post-processing approach based on multi-knowledge

Authors:
Li Zhuang;Xiaoyan Zhu
Affiliations:
Department of Computer Science and Technology, Tsinghua University, State Key Laboratory of Intelligent Technology and Systems, Beijing, P.R. China;Department of Computer Science and Technology, Tsinghua University, State Key Laboratory of Intelligent Technology and Systems, Beijing, P.R. China
Venue:
KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I
Year:
2005

Citing 4
Cited 2

Unlimited Vocabulary Script Recognition Using Character N-Grams

Mustererkennung 2000, 22. DAGM-Symposium
Dictionary Preselection in a Neuro-Markovian Word Recognition System

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
A Full English Sentence Database for Off-Line Handwriting Recognition

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Combining Trigram-based and feature-based methods for context-sensitive spelling correction

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics

Improving OCR accuracy for classical critical editions

ECDL'09 Proceedings of the 13th European conference on Research and advanced technology for digital libraries
A post-processing scheme for malayalam using statistical sub-character language models

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes an OCR post-processing approach based on multi-knowledge, which integrates language knowledge and candidate distance information given by the OCR engine. In this approach, statistical language model and semantic lexicon are combined, and candidate distance information is used to reduce the size of the search space. The experimental results show that this approach is very effective. After post-processing, the recognition accuracy rate on the test set increases from 58.45% to 83.73%, which means 60.84% error reduction.