Information extraction using two-phase pattern discovery

  • Authors:
  • Liping Ma;John Shepherd

  • Affiliations:
  • The University of New South Wales, Sydney, Australia;The University of New South Wales, Sydney, Australia

  • Venue:
  • Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a new two-phase pattern (2PP) discovery technique for information extraction. 2PP consists of orthographic pattern discovery (OPD) and semantic pattern discovery (SPD) where the OPD determines the structural features from an identified region of a document and the SPD discovers a dominant semantic pattern for the region via inference, apposition and analogy. Then the discovered pattern is applied back into the region to extract required data items through pattern matching. We evaluated 2PP using 6500 data items and obtained effective result.