Improving Chinese/English OCR Performance by Using MCE-based Character-Pair Modeling and Negative Training

  • Authors:
  • Qiang Huo;Zhi-Dan Feng

  • Affiliations:
  • -;-

  • Venue:
  • ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the past several years, we've been developing ahigh performance OCR engine for machine printed Chinese/English documents. We have reported previously (1)how to use character modeling techniques based on MCE(minimum classification error) training to achieve the highrecognition accuracy, and (2) how to use confidence-guidedprogressive search and fast match techniques to achieve thehigh recognition efficiency. In this paper, we present twomore techniques that help reduce search errors and improvethe robustness of our character recognizer. They are (1)to use MCE-trained character-pair models to avoid error-pronecharacter-level segmentation for some trouble cases,and (2) to perform a MCE-based negative training to improvethe rejection capability of the recognition models onthe hypothesized garbage images during recognition process.The efficacy of the proposed techniques is confirmedby experiments in a benchmark test.