On the limited memory BFGS method for large scale optimization
Mathematical Programming: Series A and B
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Grafting: fast, incremental feature selection by gradient descent in function space
The Journal of Machine Learning Research
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Training conditional random fields via gradient tree boosting
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Feature-rich part-of-speech tagging with a cyclic dependency network
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Evaluation and extension of maximum entropy models with inequality constraints
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Accelerated training of conditional random fields with stochastic gradient methods
ICML '06 Proceedings of the 23rd international conference on Machine learning
Scalable training of L1-regularized log-linear models
Proceedings of the 24th international conference on Machine learning
Structure compilation: trading structure for features
Proceedings of the 25th international conference on Machine learning
Sparse higher order conditional random fields for improved sequence labeling
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Performance prediction for exponential language models
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Learning and inference over constrained output
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Efficient inference of CRFs for large-scale natural language data
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Efficient inference in large conditional random fields
ECML'06 Proceedings of the 17th European conference on Machine Learning
Learning with lookahead: can history-based models rival globally optimized models?
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Two ways to use a noisy parallel news corpus for improving statistical machine translation
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Filtering and clustering relations for unsupervised information extraction in open domain
Proceedings of the 20th ACM international conference on Information and knowledge management
From n-gram-based to CRF-based translation models
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Unsupervised alignment for segmental-based language understanding
EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Hypotheses selection criteria in a reranking framework for spoken language understanding
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Structured sparsity in structured prediction
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Tree representations in probabilistic models for extended named entities detection
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Modeling inflection and word-formation in SMT
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
UiO2: sequence-labeling negation using dependency features
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Coupling knowledge-based and data-driven systems for named entity recognition
HYBRID '12 Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data
Discriminative strategies to integrate multiword expression recognition and parsing
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Modeling topic dependencies in hierarchical text categorization
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
NEWS '12 Proceedings of the 4th Named Entity Workshop
Automatically acquiring fine-grained information status distinctions in German
SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Machine learning for high-quality tokenization replicating variable tokenization schemes
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Combining compound recognition and PCFG-LA parsing with word lattices and conditional random fields
ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 2
Extraction of ingredient names from recipes by combining linguistic annotations and CRF selection
Proceedings of the 5th international workshop on Multimedia for cooking & eating activities
SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Hi-index | 0.00 |
Conditional Random Fields (CRFs) are a widely-used approach for supervised sequence labelling, notably due to their ability to handle large description spaces and to integrate structural dependency between labels. Even for the simple linear-chain model, taking structure into account implies a number of parameters and a computational effort that grows quadratically with the cardinality of the label set. In this paper, we address the issue of training very large CRFs, containing up to hundreds output labels and several billion features. Efficiency stems here from the sparsity induced by the use of a l penalty term. Based on our own implementation, we compare three recent proposals for implementing this regularization strategy. Our experiments demonstrate that very large CRFs can be trained efficiently and that very large models are able to improve the accuracy, while delivering compact parameter sets.