Combining character-based bigrams with word-based bigrams in contextual postprocessing for Chinese script recognition

  • Authors:
  • Yuanxiang Li;Xiaoqing Ding;Chew Lim Tan

  • Affiliations:
  • National University of Singapore;Tsinghua University;National University of Singapore

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

It is crucial to use contextual information to improve the recognition accuracy of Chinese script in an offline, handwritten Chinese character-recognition system. However, with the increase in the number of candidates given by a character recognizer, contextual postprocessing using a word-based bigram is time-consuming. This article presents a novel contextual postprocessing method that integrates character-based bigram postprocessing with word-based bigram postprocessing in light of the complementary action between Chinese characters and Chinese words. On the basis of isolated character recognition, character-based bigram postprocessing using a forward-backward search is first executed on a big candidate set, which improves both the accuracy and efficiency of the candidate set (the cumulative accuracy of the top ten candidates is greatly boosted). Then, to further improve accuracy, word-based bigram postprocessing (WBP) is executed on a small candidate set. This method obtains high accuracy while paying attention to postprocessing speed at the same time. Experimental results for three Chinese scripts (about 66,000 characters in total) demonstrate the effectiveness of our method: character-based bigram postprocessing improves accuracy from 81.58% to 94.50%, and the cumulative accuracy of the top ten candidates rises from 94.33% to 98.25%. After WBP, 95.75% accuracy is achieved, which is equivalent to the accuracy of WBP executed on a big candidate set. However, our method is more than 100 times faster than that of WBP.