A hybrid post-processing system for offline handwritten Chinese script recognition

  • Authors:
  • Yuan-Xiang Li;Chew Lim Tan;Xiaoqing Ding

  • Affiliations:
  • PLA University of Science and Technology, Institute of Meteorology, 211101, Nanjing, People’s Republic of China;National University of Singapore, School of Computing, 117543, Singapore, Singapore;Tsinghua University, Department of Electronic Engineering, 100084, Beijing, People’s Republic of China

  • Venue:
  • Pattern Analysis & Applications
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the recognition of offline handwritten Chinese scripts, contextual post-processing plays a vital role in improving accuracy. In this paper, we systematically analyze the key factors that have an impact on the performance of contextual post-processing: statistical language models (LMs), candidate confidence, candidate set size, and search strategy. We then present a hybrid post-processing system, which integrates various kinds of information available. Next, we investigate seven LMs, four estimation methods of candidate confidence and different size of candidate set, and illustrate their influence on the performance of contextual post-processing in detail. Experimental results justify that the performance of the LMs are affected by training corpora size, smoothing method, and model pruning, and that lower perplexity correlates with a high accuracy. Comparing different estimation methods of candidate confidence shows that, it is vital to the contextual post-processing. We also show that allowing the correct characters to be captured in a limited number of candidates is extremely important for obtaining good post-processing performance. By adopting the hybrid post-processing, we can obtain high accuracy while paying attention to post-processing speed and memory space at the same time. It is shown that the average recognition accuracy of three Chinese scripts (about 66,000 characters in total) can reach 97.65%, which means 87% error correction rate in comparison with the 81.58% average accuracy before post-processing. In the end, we give some proposals for choosing a proper post-processing method for real script recognition tasks.