Practical structured learning techniques for natural language processing

Authors:
Daniel Marcu;Harold Charles Daume, III
Affiliations:
University of Southern California;University of Southern California
Venue:
Practical structured learning techniques for natural language processing
Year:
2006

Citing 0
Cited 16

On the Use of Structures for Spoken Language Understanding: A Two-Step Approach

IEICE - Transactions on Information and Systems
Search-based structured prediction

Machine Learning
Multilingual syntactic-semantic dependency parsing with three-stage approximate max-margin linear models

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task
Summarization with a joint model for sentence extraction and compression

ILP '09 Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing
Cascaded model adaptation for dialog act segmentation and tagging

Computer Speech and Language
Training parsers by inverse reinforcement learning

Machine Learning
Automatic generation of story highlights

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Title generation with quasi-synchronous grammar

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Kernel slicing: scalable online training with conjunctive features

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Maximum metric score training for coreference resolution

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Jointly learning to extract and compress

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Large scale real-life action recognition using conditional random fields with stochastic training

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Boosting algorithm with sequence-loss cost function for structured prediction

HAIS'10 Proceedings of the 5th international conference on Hybrid Artificial Intelligence Systems - Volume Part I
Large-margin learning of submodular summarization models

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Direct error rate minimization for statistical machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Applying piecewise approximation in perceptron training of conditional random fields

IDA'12 Proceedings of the 11th international conference on Advances in Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Natural language processing is replete with problems whose outputs are highly complex and structured. The current state-of-the-art in machine learning is not yet sufficiently general to be applied to general problems in NLP. In this thesis, I present Searn (for “search-learn”), an approach to learning for structured outputs that is applicable to the wide variety of problems encountered in natural language (and, hopefully, to problems in other domains, such as vision and biology). To demonstrate Searn’s general applicability, I present applications in such diverse areas as automatic document summarization and entity detection and tracking. In these applications, Searn is empirically shown to achieve state-of-the-art performance. Searn is based on an integration of learning and search. This contrasts with standard approaches that define a model, learn parameters for that model, and then use the model and the learned parameters to produce new outputs. In most NLP problems, the “produce new outputs” step includes an intractable computation. One must therefore employ a heuristic search function for the production step. Instead of shying away from search, Searn attacks it head on and considers structured prediction to be defined by a search problem. The corresponding learning problem is then made natural: learn parameters so that search succeeds. The two application domains I study most closely in this thesis are entity detection and tracking (EDT) and automatic document summarization. EDT is the problem of finding all references to people, places and organizations in a document and identifying their relationships. Summarization is the task of producing a short summary for either a single document or for a collection of documents. These problems exhibit complex structure that cannot be captured and exploited using previously proposed structured prediction algorithms. By applying Searn to these problems, I am able to learn models that benefit from complex, non-local features of both the input and the output. Such features would not be available to structured prediction algorithm that require model tractability. These improvements lead to state-of-the-art performance on standardized data sets with low computational overhead. Searn operates by transforming structured prediction problems into a collection of classification problems, to which any standard binary classifier may be applied (for instance, a support vector machine or decision tree). In fact, Searn represents a family of structured prediction algorithms depending on the classifier and search space used. From a theoretical perspective, Searn satisfies a strong fundamental performance guarantee: given a good classification algorithm, Searn yields a good structured prediction algorithm. Such theoretical results are possible for other structured prediction only when the underlying model is tractable. For Searn, I am able to state strong results that are independent of the size or tractability of the search space. This provides theoretical justification for integrating search with learning.