Automatic learning of language model structure

  • Authors:
  • Kevin Duh;Katrin Kirchhoff

  • Affiliations:
  • University of Washington, Seattle;University of Washington, Seattle

  • Venue:
  • COLING '04 Proceedings of the 20th international conference on Computational Linguistics
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Statistical language modeling remains a challenging task, in particular for morphologically rich languages. Recently, new approaches based on factored language models have been developed to address this problem. These models provide principled ways of including additional conditioning variables other than the preceding words, such as morphological or syntactic features. However, the number of possible choices for model parameters creates a large space of models that cannot be searched exhaustively. This paper presents an entirely data-driven model selection procedure based on genetic search, which is shown to outperform both knowledge-based and random selection procedures on two different language modeling tasks (Arabic and Turkish).