Wrap-Up: a trainable discourse module for information extraction

Authors:
Stephen Soderland;Wendy Lehnert
Affiliations:
Department of Computer Science, University of Massachusetts, Amherst, MA;Department of Computer Science, University of Massachusetts, Amherst, MA
Venue:
Journal of Artificial Intelligence Research
Year:
1994

Citing 18
Cited 3

Attention, intentions, and the structure of discourse

Computational Linguistics
Grammatical category disambiguation by statistical optimization

Computational Linguistics
A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features

Machine Learning
Corpus-driven knowledge acquisition for discourse analysis

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
A vector space model for automatic indexing

Communications of the ACM
Induction of Decision Trees

Machine Learning
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Coping with ambiguity and unknown words through probabilistic models

Computational Linguistics - Special issue on using large corpora: II
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Automatic acquisition of subcategorization frames from untagged text

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Acquiring disambiguation rules from text

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Hughes Trainable Text Skimmer: description of the TTS system as used for MUC-3

MUC3 '91 Proceedings of the 3rd conference on Message understanding
Comparing human and machine performance for natural language information extraction: results for English microelectronics from the MUC-5 evaluation

MUC5 '93 Proceedings of the 5th conference on Message understanding
GE-CMU: description of the SHOGUN system used for MUC-5

MUC5 '93 Proceedings of the 5th conference on Message understanding
UMass/Hughes: description of the CIRCUS system used for MUC-5

MUC5 '93 Proceedings of the 5th conference on Message understanding
BBN: description of the PLUM system as used for MUC-4

MUC4 '92 Proceedings of the 4th conference on Message understanding
University of Massachusetts: description of the CIRCUS system as used for MUC-4

MUC4 '92 Proceedings of the 4th conference on Message understanding
Development, implementation and testing of a discourse model for newspaper texts

HLT '93 Proceedings of the workshop on Human Language Technology

Knowledge warehouse: an architectural integration of knowledge management, decision support, artificial intelligence and data warehousing

Decision Support Systems - Special issue: Decision support systems: Directions for the next decade
Using temporal cues for segmenting texts into events

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
MinePhos: A Literature Mining System for Protein Phoshphorylation Information Extraction

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The vast amounts of on-line text now available have led to renewed interest in information extraction (IE) systems that analyze unrestricted text, producing a structured representation of selected information from the text. This paper presents a novel approach that uses machine learning to acquire knowledge for some of the higher level IE processing. Wrap-Up is a trainable IE discourse component that makes intersentential inferences and identifies logical relations among information extracted from the text. Previous corpus-based approaches were limited to lower level processing such as part-of-speech tagging, lexical disambiguation, and dictionary construction. Wrap-Up is fully trainable, and not only automatically decides what classifiers are needed, but even derives the feature set for each classifier automatically. Performance equals that of a partially trainable discourse module requiring manual customization for each domain.