Integration of document detection and information extraction

Authors:
Louise Guthrie;Tomek Strzalkowski;Wang Jin;Fang Lin
Affiliations:
Lockheed Martin Corporation;GE Corporate Research and Development, Schenectady, NY;GE Corporate Research and Development, Schenectady, NY;GE Corporate Research and Development, Schenectady, NY
Venue:
TIPSTER '96 Proceedings of a workshop on held at Vienna, Virginia: May 6-8, 1996
Year:
1996

Citing 3
Cited 0

Natural language information retrieval

TREC-2 Proceedings of the second conference on Text retrieval conference
Word-sense disambiguation using statistical methods

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We have conducted a number of experiments to evaluate various modes of building an integrated detection/extraction system. The experiments were performed using SMART system as baseline. The goal was to determine if advanced information extraction methods can improve recall and precision of document detection. We identified the following two modes of integration:I. Extraction to Detection: broad-coverage extraction1. Extraction step: identify concepts for indexing2. Detection step 1: low recall, high initial precision3. Detection step 2: automatic relevance feedback using top N retrieved documents to regain recall.II. Detection to Extraction: query-specific extraction1. Detection step 1: high recall, low precision run2. Extraction step: learn concept(s) from query and retrieved subcollection3. Detection step 2: re-rank the subcollection to increase precisionOur integration effort concentrated on mode I, and the following issues:1. use of shallow but fast NLP for phrase extractions and disambiguation in place of a full syntactic parser2. use existing MUC-6 extraction capabilities to index a retrieval collection3. mixed Boolean/soft match retrieval model4. create a Universal Spotter algorithm for learning arbitrary concepts