Efficient sampling and feature selection in whole sentence maximum entropy language models

Authors:
S. F. Chen;R. Rosenfeld
Affiliations:
Sch. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA;-
Venue:
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Year:
1999

Citing 0
Cited 8

Estimation of stochastic attribute-value grammars using an informative sample

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Improvement of a Whole Sentence Maximum Entropy Language Model using grammatical features

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Using perfect sampling in parameter estimation of a whole sentence maximum entropy language model

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
A fast algorithm for feature selection in conditional maximum entropy modeling

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Graph model selection using maximum likelihood

ICML '06 Proceedings of the 23rd international conference on Machine learning
Feature forest models for probabilistic hpsg parsing

Computational Linguistics
Refining generative language models using discriminative learning

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Using Dependency Grammar Features in Whole Sentence Maximum Entropy Language Model for Speech Recognition

Proceedings of the 2010 conference on Human Language Technologies -- The Baltic Perspective: Proceedings of the Fourth International Conference Baltic HLT 2010

Quantified Score

Hi-index	0.00

Visualization

Abstract

Conditional maximum entropy models have been successfully applied to estimating language model probabilities of the form P(w|h), but are often to demanding computationally. Furthermore, the conditional framework does not lend itself to expressing global sentential phenomena. We have previously introduced a non-conditional maximum entropy language model which directly models the probability of an entire sentence or utterance. The model treats each utterance as a "bag of features", where features are arbitrary computable properties of the sentence. Using the model is computationally straightforward since it does not require normalization. Training the model requires efficient sampling of sentences from an exponential distribution. In this paper, we further develop the model and demonstrate its feasibility and power. We compare the efficiency of several sampling techniques. implement smoothing to accommodate rare features, and suggest an efficient algorithm for improving the convergence rate. We then present a novel procedure for feature selection, which exploits discrepancies between the existing model and the training corpus. We demonstrate our ideas by constructing and analyzing competitive modes in the Switchboard domain.