Acquisition of Linguistic Patterns for Knowledge-Based Information Extraction

  • Authors:
  • Jun-Tae Kim;Dan I. Moldovan

  • Affiliations:
  • -;-

  • Venue:
  • IEEE Transactions on Knowledge and Data Engineering
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents an automatic acquisition of linguistic patterns that can be used for knowledge-based information extraction from texts. In knowledge-based approach to information extraction, linguistic patterns play a central role in the recognition and classification of input texts. Although the knowledge-based approach has been proved effective for information extraction on limited domains, there are difficulties in construction of a large number of domain-specific linguistic patterns. Manual creation of patterns is time consuming and error prone, even for a small application domain. To solve the scalability and the portability problem, an automatic acquisition of patterns must be provided. In this paper, we present the PALKA (Parallel Automatic Linguistic Knowledge Acquisition) system that acquires linguistic patterns from a set of domain-specific training texts and their desired outputs. A specialized representation of patterns called FP-structures has been defined. Patterns are constructed in the form of FP-structures from training texts, and the acquired patterns are tuned further through the generalization of semantic constraints. Inductive learning mechanism is applied in the generalization step. The PALKA system has been used to generate patterns for our information extraction system developed for the fourth Message Understanding Conference (MUC-4). The MUC-4 was an ARPA-sponsored competitive evaluation of text analysis systems. Experimental results with a set of news articles from MUC-4 are discussed.