Distribution-based pruning of backoff language models

  • Authors:
  • Jianfeng Gao;Kai-Fu Lee

  • Affiliations:
  • Microsoft Research China, China;Microsoft Research China, China

  • Venue:
  • ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a distribution-based pruning of n-gram backoff language models. Instead of the conventional approach of pruning n-grams that are infrequent in training data, we prune n-grams that are likely to be infrequent in a new document. Our method is based on the n-gram distribution i.e. the probability that an n-gram occurs in a new document. Experimental results show that our method performed 7--9% (word perplexity reduction) better than conventional cutoff methods.