Structured sparsity in structured prediction

Authors:
André F. T. Martins;Noah A. Smith;Pedro M. Q. Aguiar;Mário A. T. Figueiredo
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, and Instituto de Telecomunicações, Instituto Superior Técnico, Lisboa, Portugal;Carnegie Mellon University, Pittsburgh, PA;Instituto de Sistemas e Robótica, Instituto Superior Técnico, Lisboa, Portugal;Instituto de Telecomunicações, Instituto Superior Técnico, Lisboa, Portugal
Venue:
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2011

Citing 28
Cited 4

Inducing Features of Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
Multitask Learning

Machine Learning - Special issue on inductive transfer
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
An introduction to variable and feature selection

The Journal of Machine Learning Research
Learning the Kernel Matrix with Semidefinite Programming

The Journal of Machine Learning Research
Support vector machine learning for interdependent and structured output spaces

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Ranking algorithms for named-entity extraction: boosting and the voted perceptron

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Introduction to the CoNLL-2000 shared task: chunking

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Introduction to the CoNLL-2002 shared task: language-independent named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Evaluation and extension of maximum entropy models with inequality constraints

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Non-projective dependency parsing using spanning tree algorithms

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
Scalable training of L1-regularized log-linear models

Proceedings of the 24th international conference on Machine learning
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Proceedings of the 24th international conference on Machine learning
An efficient projection for l1, ∞ regularization

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Sparse Online Learning via Truncated Gradient

The Journal of Machine Learning Research
CoNLL-X shared task on multilingual dependency parsing

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Sparse multi-scale grammars for discriminative latent variable parsing

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Sparse reconstruction by separable approximation

IEEE Transactions on Signal Processing
Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Joint covariate selection and joint subspace selection for multiple classification problems

Statistics and Computing
Efficient Online and Batch Learning Using Forward Backward Splitting

The Journal of Machine Learning Research
Practical very large scale CRFs

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Turbo parsers: dependency parsing by approximate variational inference

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Discovering sociolinguistic associations with structured sparsity

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Efficiently inducing features of conditional random fields

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

Structured Variable Selection with Sparsity-Inducing Norms

The Journal of Machine Learning Research
Optimization with Sparsity-Inducing Penalties

Foundations and Trends® in Machine Learning
"Love ya, jerkface": using sparse log-linear models to build positive (and impolite) relationships with teens

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Linguistic structure prediction with the sparseptron

XRDS: Crossroads, The ACM Magazine for Students - Scientific Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Linear models have enjoyed great success in structured prediction in NLP. While a lot of progress has been made on efficient training with several loss functions, the problem of endowing learners with a mechanism for feature selection is still unsolved. Common approaches employ ad hoc filtering or L1-regularization; both ignore the structure of the feature space, preventing practicioners from encoding structural prior knowledge. We fill this gap by adopting regularizers that promote structured sparsity, along with efficient algorithms to handle them. Experiments on three tasks (chunking, entity recognition, and dependency parsing) show gains in performance, compactness, and model interpretability.