Entire relaxation path for maximum entropy problems

Authors:
Moshe Dubiner;Yoram Singer
Affiliations:
Google;Google
Venue:
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2011

Citing 8
Cited 0

Fundamentals of speech recognition

Fundamentals of speech recognition
A maximum entropy approach to natural language processing

Computational Linguistics
Inducing Features of Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
Parallel Optimization: Theory, Algorithms and Applications

Parallel Optimization: Theory, Algorithms and Applications
Convex Optimization

Convex Optimization
Maximum Entropy Density Estimation with Generalized Regularization and an Application to Species Distribution Modeling

The Journal of Machine Learning Research
Domain adaptation of natural language processing systems

Domain adaptation of natural language processing systems
The context-tree weighting method: basic properties

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.01

Visualization

Abstract

We discuss and analyze the problem of finding a distribution that minimizes the relative entropy to a prior distribution while satisfying max-norm constraints with respect to an observed distribution. This setting generalizes the classical maximum entropy problems as it relaxes the standard constraints on the observed values. We tackle the problem by introducing a re-parametrization in which the unknown distribution is distilled to a single scalar. We then describe a homotopy between the relaxation parameter and the distribution characterizing parameter. The homotopy also reveals an aesthetic symmetry between the prior distribution and the observed distribution. We then use the reformulated problem to describe a space and time efficient algorithm for tracking the entire relaxation path. Our derivations are based on a compact geometric view of the relaxation path as a piecewise linear function in a two dimensional space of the relaxation-characterization parameters. We demonstrate the usability of our approach by applying the problem to Zipfian distributions over a large alphabet.