Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Adaptive Sparseness for Supervised Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence
Sparse bayesian learning and the relevance vector machine
The Journal of Machine Learning Research
Support vector machine learning for interdependent and structured output spaces
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Efficient projections onto the l1-ball for learning in high dimensions
Proceedings of the 25th international conference on Machine learning
Laplace maximum margin Markov networks
Proceedings of the 25th international conference on Machine learning
Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
The Journal of Machine Learning Research
Maximum Entropy Discrimination Markov Networks
The Journal of Machine Learning Research
Primal sparse Max-margin Markov networks
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Maximum Entropy Discrimination Markov Networks
The Journal of Machine Learning Research
Hi-index | 0.00 |
Sparsity is a desirable property in high dimensional learning. The l1-norm regularization can lead to primal sparsity, while max-margin methods achieve dual sparsity. Combining these two methods, an l1-norm max-margin Markov network (l1-M3N) can achieve both types of sparsity. This paper analyzes its connections to the Laplace max-margin Markov network (LapM3N), which inherits the dual sparsity of max-margin models but is pseudo-primal sparse, and to a novel adaptive M3N (AdapM3N). We show that the l1-M3N is an extreme case of the LapM3N, and the l1-M3N is equivalent to an AdapM3N. Based on this equivalence we develop a robust EM-style algorithm for learning an l1-M3N. We demonstrate the advantages of the simultaneously (pseudo-) primal and dual sparse models over the ones which enjoy either primal or dual sparsity on both synthetic and real data sets.