Fundamentals of speech recognition
Fundamentals of speech recognition
A maximum entropy approach to natural language processing
Computational Linguistics
Inducing Features of Random Fields
IEEE Transactions on Pattern Analysis and Machine Intelligence
Parallel Optimization: Theory, Algorithms and Applications
Parallel Optimization: Theory, Algorithms and Applications
Convex Optimization
The Journal of Machine Learning Research
Domain adaptation of natural language processing systems
Domain adaptation of natural language processing systems
The context-tree weighting method: basic properties
IEEE Transactions on Information Theory
Hi-index | 0.01 |
We discuss and analyze the problem of finding a distribution that minimizes the relative entropy to a prior distribution while satisfying max-norm constraints with respect to an observed distribution. This setting generalizes the classical maximum entropy problems as it relaxes the standard constraints on the observed values. We tackle the problem by introducing a re-parametrization in which the unknown distribution is distilled to a single scalar. We then describe a homotopy between the relaxation parameter and the distribution characterizing parameter. The homotopy also reveals an aesthetic symmetry between the prior distribution and the observed distribution. We then use the reformulated problem to describe a space and time efficient algorithm for tracking the entire relaxation path. Our derivations are based on a compact geometric view of the relaxation path as a piecewise linear function in a two dimensional space of the relaxation-characterization parameters. We demonstrate the usability of our approach by applying the problem to Zipfian distributions over a large alphabet.