Maximum Entropy Discrimination Markov Networks

Authors:
Jun Zhu;Eric P. Xing
Affiliations:
-;-
Venue:
The Journal of Machine Learning Research
Year:
2009

Citing 24
Cited 4

On the limited memory BFGS method for large scale optimization

Mathematical Programming: Series A and B
PAC-Bayesian model averaging

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities

Machine Learning
An Improved Predictive Accuracy Bound for Averaging Classifiers

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Feature Selection and Dualities in Maximum Entropy Discrimination

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Adaptive Sparseness for Supervised Learning

IEEE Transactions on Pattern Analysis and Machine Intelligence
Sparse bayesian learning and the relevance vector machine

The Journal of Machine Learning Research
Convex Optimization

Convex Optimization
Support vector machine learning for interdependent and structured output spaces

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Semi-supervised learning for structured output variables

ICML '06 Proceedings of the 23rd international conference on Machine learning
Discriminative unsupervised learning of structured predictors

ICML '06 Proceedings of the 23rd international conference on Machine learning
On Bayesian classification with Laplace priors

Pattern Recognition Letters
Scalable training of L1-regularized log-linear models

Proceedings of the 24th international conference on Machine learning
Direct convex relaxations of sparse SVM

Proceedings of the 24th international conference on Machine learning
Exponentiated gradient algorithms for log-linear structured prediction

Proceedings of the 24th international conference on Machine learning
Maximum Entropy Density Estimation with Generalized Regularization and an Application to Species Distribution Modeling

The Journal of Machine Learning Research
Confidence-weighted linear classification

Proceedings of the 25th international conference on Machine learning
Efficient projections onto the l1-ball for learning in high dimensions

Proceedings of the 25th international conference on Machine learning
Laplace maximum margin Markov networks

Proceedings of the 25th international conference on Machine learning
Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction

The Journal of Machine Learning Research
MedLDA: maximum margin supervised topic models for regression and classification

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
On primal and dual sparsity of Markov networks

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Primal sparse Max-margin Markov networks

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

On primal and dual sparsity of Markov networks

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
The Latent Maximum Entropy Principle

ACM Transactions on Knowledge Discovery from Data (TKDD)
MedLDA: maximum margin supervised topic models

The Journal of Machine Learning Research
Multi-view maximum entropy discrimination

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The standard maximum margin approach for structured prediction lacks a straightforward probabilistic interpretation of the learning scheme and the prediction rule. Therefore its unique advantages such as dual sparseness and kernel tricks cannot be easily conjoined with the merits of a probabilistic model such as Bayesian regularization, model averaging, and ability to model hidden variables. In this paper, we present a new general framework called maximum entropy discrimination Markov networks (MaxEnDNet, or simply, MEDN), which integrates these two approaches and combines and extends their merits. Major innovations of this approach include: 1) It extends the conventional max-entropy discrimination learning of classification rules to a new structural max-entropy discrimination paradigm of learning a distribution of Markov networks. 2) It generalizes the extant Markov network structured-prediction rule based on a point estimator of model coefficients to an averaging model akin to a Bayesian predictor that integrates over a learned posterior distribution of model coefficients. 3) It admits flexible entropic regularization of the model during learning. By plugging in different prior distributions of the model coefficients, it subsumes the well-known maximum margin Markov networks (M3N) as a special case, and leads to a model similar to an L1-regularized M3N that is simultaneously primal and dual sparse, or other new types of Markov networks. 4) It applies a modular learning algorithm that combines existing variational inference techniques and convex-optimization based M3N solvers as subroutines. Essentially, MEDN can be understood as a jointly maximum likelihood and maximum margin estimate of Markov network. It represents the first successful attempt to combine maximum entropy learning (a dual form of maximum likelihood learning) with maximum margin learning of Markov network for structured input/output problems; and the basic principle can be generalized to learning arbitrary graphical models, such as the generative Bayesian networks or models with structured hidden variables. We discuss a number of theoretical properties of this approach, and show that empirically it outperforms a wide array of competing methods for structured input/output learning on both synthetic and real OCR and web data extraction data sets.