Infrastructure for open-domain information extraction

Authors:
Mihai Surdeanu;Sanda M. Harabagiu
Affiliations:
Language Computer Corporation, Dallas TX;Language Computer Corporation, Dallas TX
Venue:
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Year:
2002

Citing 4
Cited 7

A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
University of Massachusetts: description of the CIRCUS system as used for MUC-3

MUC3 '91 Proceedings of the 3rd conference on Message understanding
SRA: description of the SRA system as used for MUC-6

MUC6 '95 Proceedings of the 6th conference on Message understanding
SRI International FASTUS system: MUC-6 test results and analysis

MUC6 '95 Proceedings of the 6th conference on Message understanding

Using predicate-argument structures for information extraction

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Experiments with interactive question-answering

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A rapid application development framework for rule-based named-entity extraction

Proceedings of the 2nd Bangalore Annual Compute Conference
Self-similarity Clustering Event Detection Based on Triggers Guidance

WISM '09 Proceedings of the International Conference on Web Information Systems and Mining
Survey of data management and analysis in disaster situations

Journal of Systems and Software
Using temporal cues for segmenting texts into events

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Unsupervised discovery of relations for analysis of textual data

Digital Investigation: The International Journal of Digital Forensics & Incident Response

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of performing open-domain Information Extraction (IE) was historically tied to the problem of ad-hoc acquisition of extraction patterns. In this paper we show that this requirement is not sufficient and that we also need to build new IE architectures that combine the role of linguistic patterns with coreference knowledge and ambiguous syntactic and semantic information. We present the implementation of a novel IE architecture, namely the CICERO system and show how (1) both high precision and high recall results were obtained for a variety of extraction domains; and (2) how textual information can be extracted for virtually any domain in a precise and reliable way. The evaluation of CICERO's performance shows a significant improvement over MUC IE systems.