Using data mining techniques and rough set theory for language modeling

  • Authors:
  • Yong Chen;Kwok-Ping Chan

  • Affiliations:
  • University of Hong Kong, The 54th Research Institute of CTE, China and Fudan University;University of Hong Kong

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this article, we propose a new postprocessing strategy, word suggestion, based on a multiple word trigger-pair language model for Chinese character recognizers. With the word suggestion strategy, Chinese character recognizers may even achieve a recognition rate greater than the top-n candidate recognition rate. To construct the multiple word trigger-pair model, data mining techniques are used to alleviate the intensive computation problem. Furthermore, rough set theory is first used in the study to discover negatively correlated relationships between words in order to prevent introducing wrong words in the process of word suggestion.