Elements of information theory
Elements of information theory
The EM algorithm for graphical association models with missing data
Computational Statistics & Data Analysis - Special issue dedicated to Toma´sˇ Havra´nek
A maximum entropy approach to natural language processing
Computational Linguistics
Inducing Features of Random Fields
IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical methods for speech recognition
Statistical methods for speech recognition
Unsupervised learning by probabilistic latent semantic analysis
Machine Learning
Parallel Optimization: Theory, Algorithms and Applications
Parallel Optimization: Theory, Algorithms and Applications
Logistic Regression, AdaBoost and Bregman Distances
Machine Learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Discriminative, generative and imitative learning
Discriminative, generative and imitative learning
Parameter estimation for probabilistic finite-state transducers
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Sequential conditional Generalized Iterative Scaling
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
A comparison of algorithms for maximum entropy parameter estimation
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Maximum Entropy Discrimination Markov Networks
The Journal of Machine Learning Research
Iterative Scaling and Coordinate Descent Methods for Maximum Entropy Models
The Journal of Machine Learning Research
Posterior Regularization for Structured Latent Variable Models
The Journal of Machine Learning Research
Trigger-based language models: a maximum entropy approach
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Boltzmann machine learning with the latent maximum entropy principle
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Tree-based reparameterization framework for analysis of sum-product and related algorithms
IEEE Transactions on Information Theory
Constructing free-energy approximations and generalized belief propagation algorithms
IEEE Transactions on Information Theory
Alternating minimization and Boltzmann machine learning
IEEE Transactions on Neural Networks
A scalable distributed syntactic, semantic, and lexical language model
Computational Linguistics
Hi-index | 0.00 |
We present an extension to Jaynes’ maximum entropy principle that incorporates latent variables. The principle of latent maximum entropy we propose is different from both Jaynes’ maximum entropy principle and maximum likelihood estimation, but can yield better estimates in the presence of hidden variables and limited training data. We first show that solving for a latent maximum entropy model poses a hard nonlinear constrained optimization problem in general. However, we then show that feasible solutions to this problem can be obtained efficiently for the special case of log-linear models---which forms the basis for an efficient approximation to the latent maximum entropy principle. We derive an algorithm that combines expectation-maximization with iterative scaling to produce feasible log-linear solutions. This algorithm can be interpreted as an alternating minimization algorithm in the information divergence, and reveals an intimate connection between the latent maximum entropy and maximum likelihood principles. To select a final model, we generate a series of feasible candidates, calculate the entropy of each, and choose the model that attains the highest entropy. Our experimental results show that estimation based on the latent maximum entropy principle generally gives better results than maximum likelihood when estimating latent variable models on small observed data samples.