Practical very large scale CRFs

Authors:
Thomas Lavergne;Olivier Cappé;François Yvon
Affiliations:
LIMSI -- CNRS;Télécom ParisTech, LTCI -- CNRS;Université Paris-Sud, LIMSI -- CNRS
Venue:
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Year:
2010

Citing 16
Cited 22

On the limited memory BFGS method for large scale optimization

Mathematical Programming: Series A and B
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Grafting: fast, incremental feature selection by gradient descent in function space

The Journal of Machine Learning Research
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Training conditional random fields via gradient tree boosting

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Evaluation and extension of maximum entropy models with inequality constraints

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Accelerated training of conditional random fields with stochastic gradient methods

ICML '06 Proceedings of the 23rd international conference on Machine learning
Scalable training of L1-regularized log-linear models

Proceedings of the 24th international conference on Machine learning
Structure compilation: trading structure for features

Proceedings of the 25th international conference on Machine learning
Sparse higher order conditional random fields for improved sequence labeling

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Performance prediction for exponential language models

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Learning and inference over constrained output

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Efficient inference of CRFs for large-scale natural language data

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Efficient inference in large conditional random fields

ECML'06 Proceedings of the 17th European conference on Machine Learning

Learning with lookahead: can history-based models rival globally optimized models?

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Two ways to use a noisy parallel news corpus for improving statistical machine translation

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Filtering and clustering relations for unsupervised information extraction in open domain

Proceedings of the 20th ACM international conference on Information and knowledge management
From n-gram-based to CRF-based translation models

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Unsupervised alignment for segmental-based language understanding

EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Hypotheses selection criteria in a reranking framework for spoken language understanding

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Structured sparsity in structured prediction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Tree representations in probabilistic models for extended named entities detection

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Modeling inflection and word-formation in SMT

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
UiO2: sequence-labeling negation using dependency features

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Coupling knowledge-based and data-driven systems for named entity recognition

HYBRID '12 Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data
Discriminative strategies to integrate multiword expression recognition and parsing

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Reducing approximation and estimation errors for Chinese lexical processing with heterogeneous annotations

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Capturing paradigmatic and syntagmatic lexical relations: towards accurate Chinese part-of-speech tagging

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Modeling topic dependencies in hierarchical text categorization

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Cost-benefit analysis of two-stage conditional random fields based English-to-Chinese machine transliteration

NEWS '12 Proceedings of the 4th Named Entity Workshop
Automatically acquiring fine-grained information status distinctions in German

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Machine learning for high-quality tokenization replicating variable tokenization schemes

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Combining compound recognition and PCFG-LA parsing with word lattices and conditional random fields

ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 2
Extraction of ingredient names from recipes by combining linguistic annotations and CRF selection

Proceedings of the 5th international workshop on Multimedia for cooking & eating activities
Generalization of discriminative approaches for speech language understanding in a multilingual context

SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Maximum-entropy word alignment and posterior-based phrase extraction for machine translation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Conditional Random Fields (CRFs) are a widely-used approach for supervised sequence labelling, notably due to their ability to handle large description spaces and to integrate structural dependency between labels. Even for the simple linear-chain model, taking structure into account implies a number of parameters and a computational effort that grows quadratically with the cardinality of the label set. In this paper, we address the issue of training very large CRFs, containing up to hundreds output labels and several billion features. Efficiency stems here from the sparsity induced by the use of a l penalty term. Based on our own implementation, we compare three recent proposals for implementing this regularization strategy. Our experiments demonstrate that very large CRFs can be trained efficiently and that very large models are able to improve the accuracy, while delivering compact parameter sets.