Note on learning rate schedules for stochastic optimization
NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Introduction to Stochastic Search and Optimization
Introduction to Stochastic Search and Optimization
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Evaluation and extension of maximum entropy models with inequality constraints
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Accelerated training of conditional random fields with stochastic gradient methods
ICML '06 Proceedings of the 23rd international conference on Machine learning
Parsing the WSJ using CCG and log-linear models
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Joint learning improves semantic role labeling
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Improving the scalability of semi-Markov conditional random fields for named entity recognition
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A discriminative global training algorithm for statistical MT
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Scalable training of L1-regularized log-linear models
Proceedings of the 24th international conference on Machine learning
Efficient projections onto the l1-ball for learning in high dimensions
Proceedings of the 25th international conference on Machine learning
Training Conditional Random Fields by Periodic Step Size Adaptation for Large-Scale Text Mining
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks
The Journal of Machine Learning Research
Introduction to the bio-entity recognition task at JNLPBA
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Sparse Online Learning via Truncated Gradient
The Journal of Machine Learning Research
EfficientL1regularized logistic regression
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Dependency parsing by belief propagation
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Semantic role labelling with tree conditional random fields
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Grafting-light: fast, incremental feature selection and structure learning of Markov random fields
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Practical very large scale CRFs
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Cross-language text classification using structural correspondence learning
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
N-best reranking by multitask learning
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Kernel slicing: scalable online training with conjunctive features
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Unsupervised word alignment with arbitrary features
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Learning condensed feature representations from large unsupervised data sets for supervised learning
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Learning with lookahead: can history-based models rival globally optimized models?
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Frequency-aware truncated methods for sparse online learning
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Cross-Lingual Adaptation Using Structural Correspondence Learning
ACM Transactions on Intelligent Systems and Technology (TIST)
Improved answer ranking in social question-answering portals
Proceedings of the 3rd international workshop on Search and mining user-generated contents
BioNLP Shared Task '11 Proceedings of the BioNLP Shared Task 2011 Workshop
The CMU-ARK German-English translation system
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Structured sparsity in structured prediction
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Structural and topical dimensions in multi-task patent translation
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Optimized online rank learning for machine translation
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Probabilistic Chinese word segmentation with non-local information and stochastic training
Information Processing and Management: an International Journal
Learning Abbreviations from Chinese and English Terms by Modeling Non-Local Information
ACM Transactions on Asian Language Information Processing (TALIP)
Regularized vector field learning with sparse approximation for mismatch removal
Pattern Recognition
Robust feature selection based on regularized brownboost loss
Knowledge-Based Systems
Hi-index | 0.00 |
Stochastic gradient descent (SGD) uses approximate gradients estimated from subsets of the training data and updates the parameters in an online fashion. This learning framework is attractive because it often requires much less training time in practice than batch training algorithms. However, L1-regularization, which is becoming popular in natural language processing because of its ability to produce compact models, cannot be efficiently applied in SGD training, due to the large dimensions of feature vectors and the fluctuations of approximate gradients. We present a simple method to solve these problems by penalizing the weights according to cumulative values for L1 penalty. We evaluate the effectiveness of our method in three applications: text chunking, named entity recognition, and part-of-speech tagging. Experimental results demonstrate that our method can produce compact and accurate models much more quickly than a state-of-the-art quasi-Newton method for L1-regularized loglinear models.