Infrastructure for open-domain information extraction

  • Authors:
  • Mihai Surdeanu;Sanda M. Harabagiu

  • Affiliations:
  • Language Computer Corporation, Dallas TX;Language Computer Corporation, Dallas TX

  • Venue:
  • HLT '02 Proceedings of the second international conference on Human Language Technology Research
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of performing open-domain Information Extraction (IE) was historically tied to the problem of ad-hoc acquisition of extraction patterns. In this paper we show that this requirement is not sufficient and that we also need to build new IE architectures that combine the role of linguistic patterns with coreference knowledge and ambiguous syntactic and semantic information. We present the implementation of a novel IE architecture, namely the CICERO system and show how (1) both high precision and high recall results were obtained for a variety of extraction domains; and (2) how textual information can be extracted for virtually any domain in a precise and reliable way. The evaluation of CICERO's performance shows a significant improvement over MUC IE systems.