Combining character-based bigrams with word-based bigrams in contextual postprocessing for Chinese script recognition

Authors:
Yuanxiang Li;Xiaoqing Ding;Chew Lim Tan
Affiliations:
National University of Singapore;Tsinghua University;National University of Singapore
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2002

Citing 4
Cited 2

Decision Combination in Multiple Classifier Systems

IEEE Transactions on Pattern Analysis and Machine Intelligence
Simulated annealing clustering of Chinese words for contextual recognition

Pattern Recognition Letters
Adaptive confidence transform based classifier combination for Chinese character recognition

Pattern Recognition Letters
Postprocessing statistical language models for handwritten Chinesecharacter recognizer

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Web searching in Chinese: A study of a search engine in Hong Kong

Journal of the American Society for Information Science and Technology
Off-line recognition of realistic Chinese handwriting using segmentation-free strategy

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is crucial to use contextual information to improve the recognition accuracy of Chinese script in an offline, handwritten Chinese character-recognition system. However, with the increase in the number of candidates given by a character recognizer, contextual postprocessing using a word-based bigram is time-consuming. This article presents a novel contextual postprocessing method that integrates character-based bigram postprocessing with word-based bigram postprocessing in light of the complementary action between Chinese characters and Chinese words. On the basis of isolated character recognition, character-based bigram postprocessing using a forward-backward search is first executed on a big candidate set, which improves both the accuracy and efficiency of the candidate set (the cumulative accuracy of the top ten candidates is greatly boosted). Then, to further improve accuracy, word-based bigram postprocessing (WBP) is executed on a small candidate set. This method obtains high accuracy while paying attention to postprocessing speed at the same time. Experimental results for three Chinese scripts (about 66,000 characters in total) demonstrate the effectiveness of our method: character-based bigram postprocessing improves accuracy from 81.58% to 94.50%, and the cumulative accuracy of the top ten candidates rises from 94.33% to 98.25%. After WBP, 95.75% accuracy is achieved, which is equivalent to the accuracy of WBP executed on a big candidate set. However, our method is more than 100 times faster than that of WBP.