The Latent Maximum Entropy Principle

Authors:
Shaojun Wang;Dale Schuurmans;Yunxin Zhao
Affiliations:
Wright State University;University of Alberta;University of Missouri at Columbia
Venue:
ACM Transactions on Knowledge Discovery from Data (TKDD)
Year:
2012

Citing 24
Cited 3

Elements of information theory

Elements of information theory
The EM algorithm for graphical association models with missing data

Computational Statistics & Data Analysis - Special issue dedicated to Toma´sˇ Havra´nek
A maximum entropy approach to natural language processing

Computational Linguistics
Information geometry of the EM and em algorithms for neural networks

Neural Networks
Inducing Features of Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical methods for speech recognition

Statistical methods for speech recognition
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Parallel Optimization: Theory, Algorithms and Applications

Parallel Optimization: Theory, Algorithms and Applications
Logistic Regression, AdaBoost and Bregman Distances

Machine Learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Discriminative, generative and imitative learning

Discriminative, generative and imitative learning
Parameter estimation for probabilistic finite-state transducers

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Sequential conditional Generalized Iterative Scaling

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Lexicalized stochastic modeling of constraint-based grammars using log-linear measures and EM training

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Combining Statistical Language Models via the Latent Maximum Entropy Principle

Machine Learning
A comparison of algorithms for maximum entropy parameter estimation

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Maximum Entropy Discrimination Markov Networks

The Journal of Machine Learning Research
Iterative Scaling and Coordinate Descent Methods for Maximum Entropy Models

The Journal of Machine Learning Research
Posterior Regularization for Structured Latent Variable Models

The Journal of Machine Learning Research
Trigger-based language models: a maximum entropy approach

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Boltzmann machine learning with the latent maximum entropy principle

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Tree-based reparameterization framework for analysis of sum-product and related algorithms

IEEE Transactions on Information Theory
Constructing free-energy approximations and generalized belief propagation algorithms

IEEE Transactions on Information Theory
Alternating minimization and Boltzmann machine learning

IEEE Transactions on Neural Networks

Maximum Entropy Modeling: A Suitable Framework to Learn Context-Dependent Lexicon Models for Statistical Machine Translation

Machine Learning
Combining Statistical Language Models via the Latent Maximum Entropy Principle

Machine Learning
A scalable distributed syntactic, semantic, and lexical language model

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an extension to Jaynes’ maximum entropy principle that incorporates latent variables. The principle of latent maximum entropy we propose is different from both Jaynes’ maximum entropy principle and maximum likelihood estimation, but can yield better estimates in the presence of hidden variables and limited training data. We first show that solving for a latent maximum entropy model poses a hard nonlinear constrained optimization problem in general. However, we then show that feasible solutions to this problem can be obtained efficiently for the special case of log-linear models---which forms the basis for an efficient approximation to the latent maximum entropy principle. We derive an algorithm that combines expectation-maximization with iterative scaling to produce feasible log-linear solutions. This algorithm can be interpreted as an alternating minimization algorithm in the information divergence, and reveals an intimate connection between the latent maximum entropy and maximum likelihood principles. To select a final model, we generate a series of feasible candidates, calculate the entropy of each, and choose the model that attains the highest entropy. Our experimental results show that estimation based on the latent maximum entropy principle generally gives better results than maximum likelihood when estimating latent variable models on small observed data samples.