Unsupervised discovery of extraction patterns for information extraction

  • Authors:
  • Satoshi Sekine;Ralph Grishman;Kiyoshi Sudo

  • Affiliations:
  • -;-;-

  • Venue:
  • Unsupervised discovery of extraction patterns for information extraction
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The task of Information Extraction (IE) is to find specific types of information in natural language text. In particular, event extraction identifies instances of a particular type of event or fact (a particular “scenario”), including the entities involved, and fills a database which has been pre-defined for the scenario. As the number of documents available on-line has multiplied, entity extraction has grown in importance for various applications, including tracking terrorist activities from newswire sources and building a database of job postings from the Web, to name a few. Linguistic contexts, such as predicate-argument relationships, have been widely used as extraction patterns to identify the items to be extracted from the text. The cost of creating extraction patterns for each scenario has been a bottleneck limiting the portability of information extraction systems to different scenarios, although there has been some research on semi-supervised pattern discovery procedures to reduce this cost. The challenge is to develop a fully automatic method for identifying extraction patterns for a scenario specified by the user. This dissertation presents a novel approach for the unsupervised discovery of extraction patterns for event extraction from raw text. First, we present a framework that allows the user to have a self-customizing information ex traction system for his/her query: the Query-Driven Information Extraction (QDIE) framework. The input to the QDIE framework is the user's query: either a set of keywords or a narrative description of the event extraction task. Second, we assess the improvement in extraction pattern models. By considering the shortcomings of the prior work based on predicate-argument models and their extensions, we propose a novel extraction pattern model that is based on arbitrary subtrees of dependency trees. Third, we address the issue of portability across languages. As a case study of the QDIE framework, we implemented a pre-CODIE system, a Cross-Lingual On-Demand Information Extraction system requiring minimal human intervention, which incorporates the QDIE framework as a component for pattern discovery. In addition, we assess the role of machine translation in cross-lingual information extraction by comparing translation-based implementations.