Flexible text segmentation with structured multilabel classification

Authors:
Ryan McDonald;Koby Crammer;Fernando Pereira
Affiliations:
University of Pennsylvania, Philadelphia, PA;University of Pennsylvania, Philadelphia, PA;University of Pennsylvania, Philadelphia, PA
Venue:
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Year:
2005

Citing 12
Cited 20

An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Parallel Optimization: Theory, Algorithms and Applications

Parallel Optimization: Theory, Algorithms and Applications
A new family of online algorithms for category ranking

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Chunking with support vector machines

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Detecting errors in discontinuous structural annotation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics

Efficient inference on sequence segmentation models

ICML '06 Proceedings of the 23rd international conference on Machine learning
Accurate max-margin training for structured output spaces

Proceedings of the 25th international conference on Machine learning
Information Extraction

Foundations and Trends in Databases
Handling Conjunctions in Named Entities

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Advanced online learning for natural language processing

HLT-Tutorials '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Tutorial Abstracts
Recognising nested named entities in biomedical text

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Automatic code assignment to medical text

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Modeling latent-dynamic in shallow parsing: a latent conditional model with improved inference

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Domain adaptation with structural correspondence learning

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Revealing the structure of medical dictations with conditional random fields

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Loss-sensitive discriminative training of machine transliteration models

SRWS '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium
Global inference for sentence compression an integer linear programming approach

Journal of Artificial Intelligence Research
Database-text alignment via structured multilabel classification

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Nested named entity recognition

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Confidence in structured-prediction using confidence-weighted models

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Accelerated training of maximum margin Markov models for sequence labeling: a case study of NP chunking

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Goodness: a method for measuring machine translation confidence

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Natural Language Processing (Almost) from Scratch

The Journal of Machine Learning Research
Confidence-weighted linear classification for text categorization

The Journal of Machine Learning Research
Exploiting chunk-level features to improve phrase chunking

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many language processing tasks can be reduced to breaking the text into segments with prescribed properties. Such tasks include sentence splitting, tokenization, named-entity extraction, and chunking. We present a new model of text segmentation based on ideas from multilabel classification. Using this model, we can naturally represent segmentation problems involving overlapping and non-contiguous segments. We evaluate the model on entity extraction and noun-phrase chunking and show that it is more accurate for overlapping and non-contiguous segments, but it still performs well on simpler data sets for which sequential tagging has been the best method.