Automatic learning of language model structure

Authors:
Kevin Duh;Katrin Kirchhoff
Affiliations:
University of Washington, Seattle;University of Washington, Seattle
Venue:
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Year:
2004

Citing 7
Cited 4

Adaptation in natural and artificial systems

Adaptation in natural and artificial systems
Class-based n-gram models of natural language

Computational Linguistics
Statistical language understanding using frame semantics

Statistical language understanding using frame semantics
Statistical morphological disambiguation for agglutinative languages

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Dependency parsing with an extended finite state approach

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Factored language models and generalized parallel backoff

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Multi-speaker language modeling

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers

A genetic algorithm for learning significant phrase patterns in radiology reports

Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
Factored neural language models

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Improved language modeling for statistical machine translation

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Morpheme-based and factored language modeling for amharic speech recognition

LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Statistical language modeling remains a challenging task, in particular for morphologically rich languages. Recently, new approaches based on factored language models have been developed to address this problem. These models provide principled ways of including additional conditioning variables other than the preceding words, such as morphological or syntactic features. However, the number of possible choices for model parameters creates a large space of models that cannot be searched exhaustively. This paper presents an entirely data-driven model selection procedure based on genetic search, which is shown to outperform both knowledge-based and random selection procedures on two different language modeling tasks (Arabic and Turkish).