Bayesian Learning for Neural Networks
Bayesian Learning for Neural Networks
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Sparse bayesian learning and the relevance vector machine
The Journal of Machine Learning Research
Convex Optimization
Support vector machine learning for interdependent and structured output spaces
ICML '04 Proceedings of the twenty-first international conference on Machine learning
On Bayesian classification with Laplace priors
Pattern Recognition Letters
Scalable training of L1-regularized log-linear models
Proceedings of the 24th international conference on Machine learning
Exponentiated gradient algorithms for log-linear structured prediction
Proceedings of the 24th international conference on Machine learning
A scalable modular convex solver for regularized risk minimization
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient projections onto the l1-ball for learning in high dimensions
Proceedings of the 25th international conference on Machine learning
Laplace maximum margin Markov networks
Proceedings of the 25th international conference on Machine learning
Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
The Journal of Machine Learning Research
On primal and dual sparsity of Markov networks
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Cutting-plane training of structural SVMs
Machine Learning
Maximum Entropy Discrimination Markov Networks
The Journal of Machine Learning Research
Grafting-light: fast, incremental feature selection and structure learning of Markov random fields
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
Max-margin Markov networks (M3N) have shown great promise in structured prediction and relational learning. Due to the KKT conditions, the M3N enjoys dual sparsity. However, the existing M3N formulation does not enjoy primal sparsity, which is a desirable property for selecting significant features and reducing the risk of over-fitting. In this paper, we present an l1-norm regularized max-margin Markov network (l1-M3N), which enjoys dual and primal sparsity simultaneously. To learn an l1-M3N, we present three methods including projected sub-gradient, cutting-plane, and a novel EM-style algorithm, which is based on an equivalence between l1-M3N and an adaptive M3N. We perform extensive empirical studies on both synthetic and real data sets. Our experimental results show that: (1) l1-M3N can effectively select significant features; (2) l1-M3N can perform as well as the pseudo-primal sparse Laplace M3N in prediction accuracy, while consistently outperforms other competing methods that enjoy either primal or dual sparsity; and (3) the EM-algorithm is more robust than the other two in pre-diction accuracy and time efficiency.