Elements of information theory
Elements of information theory
A limited memory algorithm for bound constrained optimization
SIAM Journal on Scientific Computing
Machine Learning
Exponentiated gradient versus gradient descent for linear predictors
Information and Computation
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Relative Loss Bounds for Multidimensional Regression Problems
Machine Learning
Incremental Subgradient Methods for Nondifferentiable Optimization
SIAM Journal on Optimization
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
On the algorithmic implementation of multiclass kernel-based vector machines
The Journal of Machine Learning Research
Support vector machine learning for interdependent and structured output spaces
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Probability and Computing: Randomized Algorithms and Probabilistic Analysis
Probability and Computing: Randomized Algorithms and Probabilistic Analysis
A Fast Dual Algorithm for Kernel Logistic Regression
Machine Learning
Accelerated training of conditional random fields with stochastic gradient methods
ICML '06 Proceedings of the 23rd international conference on Machine learning
Training linear SVMs in linear time
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Online large-margin training of dependency parsers
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
QP Algorithms with Guaranteed Accuracy and Run Time for Support Vector Machines
The Journal of Machine Learning Research
Structured Prediction, Dual Extragradient and Bregman Projections
The Journal of Machine Learning Research
Multiplicative Updates for Nonnegative Quadratic Programming
Neural Computation
Exponentiated gradient algorithms for log-linear structured prediction
Proceedings of the 24th international conference on Machine learning
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM
Proceedings of the 24th international conference on Machine learning
A scalable modular convex solver for regularized risk minimization
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression
The Journal of Machine Learning Research
CoNLL-X shared task on multilingual dependency parsing
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Gaps in support vector optimization
COLT'07 Proceedings of the 20th annual conference on Learning theory
Mirror descent and nonlinear projected subgradient methods for convex optimization
Operations Research Letters
A dual coordinate descent method for large-scale linear SVM
Proceedings of the 25th international conference on Machine learning
A sequential dual method for large scale multi-class linear svms
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Max-Margin Weight Learning for Markov Logic Networks
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Online EM for unsupervised models
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Iterative Scaling and Coordinate Descent Methods for Maximum Entropy Models
The Journal of Machine Learning Research
An improved algorithm for the solution of the regularization path of support vector machine
IEEE Transactions on Neural Networks
Efficient third-order dependency parsers
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Fast and Scalable Local Kernel Machines
The Journal of Machine Learning Research
LACBoost and FisherBoost: optimally building cascade classifiers
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Entropy and margin maximization for structured output learning
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Exploiting web-derived selectional preference to improve statistical dependency parsing
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Joint training of dependency parsing filters through latent support vector machines
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Large scale real-life action recognition using conditional random fields with stochastic training
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Accelerated training of max-margin Markov networks with kernels
ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Maximum margin ranking algorithms for information retrieval
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Joint models for Chinese POS tagging and dependency parsing
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Fast Structured Prediction Using Large Margin Sigmoid Belief Networks
International Journal of Computer Vision
Attacking parsing bottlenecks with unlabeled data and relevant factorizations
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Probabilistic Chinese word segmentation with non-local information and stochastic training
Information Processing and Management: an International Journal
Learning Abbreviations from Chinese and English Terms by Modeling Non-Local Information
ACM Transactions on Asian Language Information Processing (TALIP)
Stochastic dual coordinate ascent methods for regularized loss
The Journal of Machine Learning Research
Large-scale linear nonparallel support vector machine solver
Neural Networks
Accelerated training of max-margin Markov networks with kernels
Theoretical Computer Science
Joint Optimization for Chinese POS Tagging and Dependency Parsing
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Hi-index | 0.00 |
Log-linear and maximum-margin models are two commonly-used methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large data sets. This paper describes exponentiated gradient (EG) algorithms for training such models, where EG updates are applied to the convex dual of either the log-linear or max-margin objective function; the dual in both the log-linear and max-margin cases corresponds to minimizing a convex function with simplex constraints. We study both batch and online variants of the algorithm, and provide rates of convergence for both cases. In the max-margin case, O(1/ε) EG updates are required to reach a given accuracy ε in the dual; in contrast, for log-linear models only O(log(1/ε)) updates are required. For both the max-margin and log-linear cases, our bounds suggest that the online EG algorithm requires a factor of n less computation to reach a desired accuracy than the batch EG algorithm, where n is the number of training examples. Our experiments confirm that the online algorithms are much faster than the batch algorithms in practice. We describe how the EG updates factor in a convenient way for structured prediction problems, allowing the algorithms to be efficiently applied to problems such as sequence learning or natural language parsing. We perform extensive evaluation of the algorithms, comparing them to L-BFGS and stochastic gradient descent for log-linear models, and to SVM-Struct for max-margin models. The algorithms are applied to a multi-class problem as well as to a more complex large-scale parsing task. In all these settings, the EG algorithms presented here outperform the other methods.