Extracting Information from Semistructured Data

  • Authors:
  • Liping Ma;John Shepherd;Yanchun Zhang

  • Affiliations:
  • -;-;-

  • Venue:
  • WAIM '02 Proceedings of the Third International Conference on Advances in Web-Age Information Management
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes work towards automatically building on-line structured information resources from information sources that are comprised largely of natural language but with some structuring conventions. Such conversion requires two phases: region identification of the incoming documents, and mapping the information they contain into a more structured form. We describe a system that uses decision-tree-based machine learning techniques to build a classifier that can accurately identify document regions and discuss pattern-discovery methods for extracting information from the identified regions. Experiments demonstrate that this approach works well.