An OCR post-processing approach based on multi-knowledge

  • Authors:
  • Li Zhuang;Xiaoyan Zhu

  • Affiliations:
  • Department of Computer Science and Technology, Tsinghua University, State Key Laboratory of Intelligent Technology and Systems, Beijing, P.R. China;Department of Computer Science and Technology, Tsinghua University, State Key Laboratory of Intelligent Technology and Systems, Beijing, P.R. China

  • Venue:
  • KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes an OCR post-processing approach based on multi-knowledge, which integrates language knowledge and candidate distance information given by the OCR engine. In this approach, statistical language model and semantic lexicon are combined, and candidate distance information is used to reduce the size of the search space. The experimental results show that this approach is very effective. After post-processing, the recognition accuracy rate on the test set increases from 58.45% to 83.73%, which means 60.84% error reduction.