Representing Web Data as Complex Objects

  • Authors:
  • Alberto H. F. Laender;Berthier A. Ribeiro-Neto;Altigran Soares da Silva;Elaine E. Silva

  • Affiliations:
  • -;-;-;-

  • Venue:
  • EC-WEB '00 Proceedings of the First International Conference on Electronic Commerce and Web Technologies
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

The popularization of the Web has made a huge volume of data available for a large audience. In a large number of Web sites, such as bookstores, electronic catalogs, travel agencies, etc., the pages constitute documents which are composed of pieces of data whose overall structure can be easily recognized. Such pages are called data-rich and can be seen as collections of complex objects. In this paper, we show how such objects can be represented by nested tables, which are simple, intuitive, and quite convenient for expressing their implicit structure. The assumption is that, for most sites of interest, only few examples are required to reveal the structure of the objects. To corroborate our assumption, we describe a data extraction tool that adopts this approach and present results of some experiments carried out with this tool.