On the limited memory BFGS method for large scale optimization
Mathematical Programming: Series A and B
COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
An Improved Predictive Accuracy Bound for Averaging Classifiers
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Feature Selection and Dualities in Maximum Entropy Discrimination
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Adaptive Sparseness for Supervised Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence
Sparse bayesian learning and the relevance vector machine
The Journal of Machine Learning Research
Convex Optimization
Support vector machine learning for interdependent and structured output spaces
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Semi-supervised learning for structured output variables
ICML '06 Proceedings of the 23rd international conference on Machine learning
Discriminative unsupervised learning of structured predictors
ICML '06 Proceedings of the 23rd international conference on Machine learning
On Bayesian classification with Laplace priors
Pattern Recognition Letters
Scalable training of L1-regularized log-linear models
Proceedings of the 24th international conference on Machine learning
Direct convex relaxations of sparse SVM
Proceedings of the 24th international conference on Machine learning
Exponentiated gradient algorithms for log-linear structured prediction
Proceedings of the 24th international conference on Machine learning
The Journal of Machine Learning Research
Confidence-weighted linear classification
Proceedings of the 25th international conference on Machine learning
Efficient projections onto the l1-ball for learning in high dimensions
Proceedings of the 25th international conference on Machine learning
Laplace maximum margin Markov networks
Proceedings of the 25th international conference on Machine learning
Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
The Journal of Machine Learning Research
MedLDA: maximum margin supervised topic models for regression and classification
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
On primal and dual sparsity of Markov networks
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Primal sparse Max-margin Markov networks
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
On primal and dual sparsity of Markov networks
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
The Latent Maximum Entropy Principle
ACM Transactions on Knowledge Discovery from Data (TKDD)
MedLDA: maximum margin supervised topic models
The Journal of Machine Learning Research
Multi-view maximum entropy discrimination
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
The standard maximum margin approach for structured prediction lacks a straightforward probabilistic interpretation of the learning scheme and the prediction rule. Therefore its unique advantages such as dual sparseness and kernel tricks cannot be easily conjoined with the merits of a probabilistic model such as Bayesian regularization, model averaging, and ability to model hidden variables. In this paper, we present a new general framework called maximum entropy discrimination Markov networks (MaxEnDNet, or simply, MEDN), which integrates these two approaches and combines and extends their merits. Major innovations of this approach include: 1) It extends the conventional max-entropy discrimination learning of classification rules to a new structural max-entropy discrimination paradigm of learning a distribution of Markov networks. 2) It generalizes the extant Markov network structured-prediction rule based on a point estimator of model coefficients to an averaging model akin to a Bayesian predictor that integrates over a learned posterior distribution of model coefficients. 3) It admits flexible entropic regularization of the model during learning. By plugging in different prior distributions of the model coefficients, it subsumes the well-known maximum margin Markov networks (M3N) as a special case, and leads to a model similar to an L1-regularized M3N that is simultaneously primal and dual sparse, or other new types of Markov networks. 4) It applies a modular learning algorithm that combines existing variational inference techniques and convex-optimization based M3N solvers as subroutines. Essentially, MEDN can be understood as a jointly maximum likelihood and maximum margin estimate of Markov network. It represents the first successful attempt to combine maximum entropy learning (a dual form of maximum likelihood learning) with maximum margin learning of Markov network for structured input/output problems; and the basic principle can be generalized to learning arbitrary graphical models, such as the generative Bayesian networks or models with structured hidden variables. We discuss a number of theoretical properties of this approach, and show that empirically it outperforms a wide array of competing methods for structured input/output learning on both synthetic and real OCR and web data extraction data sets.