Hierarchical maximum entropy density estimation

Authors:
Miroslav Dudik;David M. Blei;Robert E. Schapire
Affiliations:
Princeton University, Princeton, NJ;Princeton University, Princeton, NJ;Princeton University, Princeton, NJ
Venue:
Proceedings of the 24th international conference on Machine learning
Year:
2007

Citing 6
Cited 4

Improving Text Classification by Shrinkage in a Hierarchy of Classes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Convex Optimization

Convex Optimization
Evaluation and extension of maximum entropy models with inequality constraints

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Constructing informative priors using transfer learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
A model of inductive bias learning

Journal of Artificial Intelligence Research
Maximum entropy distribution estimation with generalized regularization

COLT'06 Proceedings of the 19th annual conference on Learning Theory

Estimating rates of rare events with multiple hierarchies through scalable log-linear models

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised ontology induction from text

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Multitask Sparsity via Maximum Entropy Discrimination

The Journal of Machine Learning Research
Temporal multi-hierarchy smoothing for estimating rates of rare events

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the problem of simultaneously estimating several densities where the datasets are organized into overlapping groups, such as a hierarchy. For this problem, we propose a maximum entropy formulation, which systematically incorporates the groups and allows us to share the strength of prediction across similar datasets. We derive general performance guarantees, and show how some previous approaches, such as hierarchical shrinkage and hierarchical priors, can be derived as special cases. We demonstrate the proposed technique on synthetic data and in a real-world application to modeling the geographic distributions of species hierarchically grouped in a taxonomy. Specifically, we model the geographic distributions of species in the Australian wet tropics and Northeast New South Wales. In these regions, small numbers of samples per species significantly hinder effective prediction. Substantial benefits are obtained by combining information across taxonomic groups.