Combining Statistical Language Models via the Latent Maximum Entropy Principle

Authors:
Shaojun Wang;Dale Schuurmans;Fuchun Peng;Yunxin Zhao
Affiliations:
Department of Computing Science, University of Alberta, Canada;Department of Computing Science, University of Alberta, Canada;Department of Computer Science, University of Massachusetts at Amherst, USA;Department of Computer Engineering and Computer Science, University of Missouri at Columbia, USA
Venue:
Machine Learning
Year:
2005

Citing 14
Cited 4

The EM algorithm for graphical association models with missing data

Computational Statistics & Data Analysis - Special issue dedicated to Toma´sˇ Havra´nek
A maximum entropy approach to natural language processing

Computational Linguistics
Inducing Features of Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical methods for speech recognition

Statistical methods for speech recognition
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A neural probabilistic language model

The Journal of Machine Learning Research
Augmenting Naive Bayes Classifiers with Statistical Language Models

Information Retrieval
Probabilistic top-down parsing and language modeling

Computational Linguistics
Stochastic attribute-value grammars

Computational Linguistics
A Mathematical Theory of Communication

A Mathematical Theory of Communication
Graphical Models, Exponential Families, and Variational Inference

Graphical Models, Exponential Families, and Variational Inference
The Latent Maximum Entropy Principle

ACM Transactions on Knowledge Discovery from Data (TKDD)
Learning mixture models with the regularized latent maximum entropy principle

IEEE Transactions on Neural Networks

Exploiting syntactic, semantic and lexical regularities in language modeling via directed Markov random fields

ICML '05 Proceedings of the 22nd international conference on Machine learning
A novel text modeling approach for structural comparison and alignment of biomolecules

WSEAS Transactions on Computers
The Latent Maximum Entropy Principle

ACM Transactions on Knowledge Discovery from Data (TKDD)
A scalable distributed syntactic, semantic, and lexical language model

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a unified probabilistic framework for statistical language modeling which can simultaneously incorporate various aspects of natural language, such as local word interaction, syntactic structure and semantic document information. Our approach is based on a recent statistical inference principle we have proposed--the latent maximum entropy principle--which allows relationships over hidden features to be effectively captured in a unified model. Our work extends previous research on maximum entropy methods for language modeling, which only allow observed features to be modeled. The ability to conveniently incorporate hidden variables allows us to extend the expressiveness of language models while alleviating the necessity of pre-processing the data to obtain explicitly observed features. We describe efficient algorithms for marginalization, inference and normalization in our extended models. We then use these techniques to combine two standard forms of language models: local lexical models (Markov N-gram models) and global document-level semantic models (probabilistic latent semantic analysis). Our experimental results on the Wall Street Journal corpus show that we obtain a 18.5% reduction in perplexity compared to the baseline tri-gram model with Good-Turing smoothing.