Top-Down Extraction of Semi-Structured Data

  • Authors:
  • Berthier Ribeiro-Neto;Alberto H. F. Laender;Altigran S. da Silva

  • Affiliations:
  • -;-;-

  • Venue:
  • SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose an innovative approach to extracting semi-structured data from Web sources. The idea is to collect a couple of example objects from the user and to use this information to extract new objects from new pages or texts. We propose a top-down strategy that extracts complex objects decomposing them in objects less complex, until atomic objects have been extracted. Through experimentation, we demonstrate that with a small number of given examples our strategy is able to extract most of the objects present in a Web source given as input.