Integrating history-length interpolation and classes in language modeling

  • Authors:
  • Hinrich Schütze

  • Affiliations:
  • University of Stuttgart, Germany

  • Venue:
  • HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Building on earlier work that integrates different factors in language modeling, we view (i) backing off to a shorter history and (ii) class-based generalization as two complementary mechanisms of using a larger equivalence class for prediction when the default equivalence class is too small for reliable estimation. This view entails that the classes in a language model should be learned from rare events only and should be preferably applied to rare events. We construct such a model and show that both training on rare events and preferable application to rare events improve perplexity when compared to a simple direct interpolation of class-based with standard language models.