A limited memory algorithm for bound constrained optimization
SIAM Journal on Scientific Computing
Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization
ACM Transactions on Mathematical Software (TOMS)
Discriminative Reranking for Natural Language Parsing
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Feature selection, L1 vs. L2 regularization, and rotational invariance
ICML '04 Proceedings of the twenty-first international conference on Machine learning
A comparison of algorithms for maximum entropy parameter estimation
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Evaluation and extension of maximum entropy models with inequality constraints
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Discriminative structure and parameter learning for Markov logic networks
Proceedings of the 25th international conference on Machine learning
A quasi-Newton approach to non-smooth convex optimization
Proceedings of the 25th international conference on Machine learning
Laplace maximum margin Markov networks
Proceedings of the 25th international conference on Machine learning
StatSnowball: a statistical approach to extracting entity relationships
Proceedings of the 18th international conference on World wide web
Large-scale deep unsupervised learning using graphics processors
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
More generality in efficient multiple kernel learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Primal sparse Max-margin Markov networks
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Improving classification accuracy using automatically extracted training data
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A discriminative alignment model for abbreviation recognition
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Modeling latent-dynamic in shallow parsing: a latent conditional model with improved inference
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Fast full parsing by linear-chain conditional random fields
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
A discriminative candidate generator for string transformations
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Sparse multi-scale grammars for discriminative latent variable parsing
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Feature selection for activity recognition in multi-robot domains
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Generalizing local translation models
SSST '08 Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation
Exponential family sparse coding with applications to self-taught learning
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Polynomial to linear: efficient classification with conjunctive features
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Maximum Entropy Discrimination Markov Networks
The Journal of Machine Learning Research
Iterative Scaling and Coordinate Descent Methods for Maximum Entropy Models
The Journal of Machine Learning Research
A Quasi-Newton Approach to Nonsmooth Convex Optimization Problems in Machine Learning
The Journal of Machine Learning Research
Grafting-light: fast, incremental feature selection and structure learning of Markov random fields
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast query execution for retrieval models based on path-constrained random walks
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
BioSnowball: automated population of Wikis
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
SysML'08 Proceedings of the Third conference on Tackling computer systems problems with machine learning techniques
Improved models of distortion cost for statistical machine translation
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Practical very large scale CRFs
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Function-based question classification for general QA
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization
The Journal of Machine Learning Research
The Journal of Machine Learning Research
Extending the entity grid with entity-specific features
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Learning condensed feature representations from large unsupervised data sets for supervised learning
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
An improved GLMNET for l1-regularized logistic regression
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Predictive client-side profiles for personalized advertising
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Super-Linear Convergence of Dual Augmented Lagrangian Algorithm for Sparsity Regularized Estimation
The Journal of Machine Learning Research
Language use as a reflection of socialization in online communities
LSM '11 Proceedings of the Workshop on Languages in Social Media
Large scale real-life action recognition using conditional random fields with stochastic training
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
l1-penalized linear mixed-effects models for BCI
ICANN'11 Proceedings of the 21th international conference on Artificial neural networks - Volume Part I
Author age prediction from text using linear regression
LaTeCH '11 Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Structured sparsity in structured prediction
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Rumor has it: identifying misinformation in microblogs
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Identifying a small set of marker genes using minimum expected cost of misclassification
Artificial Intelligence in Medicine
The echo state conditional random field model for sequential data modeling
Expert Systems with Applications: An International Journal
The latent words language model
Computer Speech and Language
Citation-based bootstrapping for large-scale author disambiguation
Journal of the American Society for Information Science and Technology
Confidence-weighted linear classification for text categorization
The Journal of Machine Learning Research
An improved GLMNET for L1-regularized logistic regression
The Journal of Machine Learning Research
Entity clustering across languages
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Graph-based lexicon expansion with sparsity-inducing penalties
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Discovering factions in the computational linguistics community
ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
A class-based agreement model for generating accurately inflected translations
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Bootstrapping a unified model of lexical and phonetic acquisition
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A probabilistic model for canonicalizing named entity mentions
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Labeling images by integrating sparse multiple distance learning and semantic context modeling
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Playing by the rules: mining query associations to predict search performance
Proceedings of the sixth ACM international conference on Web search and data mining
Multi-resolutive sparse approximations of d-dimensional data
Computer Vision and Image Understanding
Query expansion using path-constrained random walks
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Clickage: towards bridging semantic and intent gaps via mining click logs of search engines
Proceedings of the 21st ACM international conference on Multimedia
Large-scale multilabel propagation based on efficient sparse graph construction
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Fuzzy rough based regularization in Generalized Multiple Kernel Learning
Computers & Mathematics with Applications
Hi-index | 0.00 |
The L-BFGS limited-memory quasi-Newton method is the algorithm of choice for optimizing the parameters of large-scale log-linear models with L2 regularization, but it cannot be used for an L1-regularized loss due to its non-differentiability whenever some parameter is zero. Efficient algorithms have been proposed for this task, but they are impractical when the number of parameters is very large. We present an algorithm Orthant-Wise Limited-memory Quasi-Newton (OWL-QN), based on L-BFGS, that can efficiently optimize the L1-regularized log-likelihood of log-linear models with millions of parameters. In our experiments on a parse reranking task, our algorithm was several orders of magnitude faster than an alternative algorithm, and substantially faster than L-BFGS on the analogous L2-regularized problem. We also present a proof that OWL-QN is guaranteed to converge to a globally optimal parameter vector.