Statistical behavior analysis of smoothing methods for language models of mandarin data sets

  • Authors:
  • Ming-Shing Yu;Feng-Long Huang;Piyu Tsai

  • Affiliations:
  • Department of Information Science, National Chung-Hsing University, Taichung, Taiwan;Department of Computer Science and Information Engineering, National United University, MiaoLi, Taiwan;Department of Computer Science and Information Engineering, National United University, MiaoLi, Taiwan

  • Venue:
  • AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we discuss the properties of statistical behavior and entropies of three smoothing methods; two well-known and one proposed smoothing method will be used on three language models in Mandarin data sets. Because of the problem of data sparseness, smoothing methods are employed to estimate the probability for each event (including all the seen and unseen events) in a language model. A set of properties used to analyze the statistical behaviors of three smoothing methods are proposed. Our proposed smoothing methods comply with all the properties. We implement three language models in Mandarin data sets and then discuss the entropy. In general, the entropies of proposed smoothing method for three models are lower than that of other two methods.