Inferring structure in semistructured data

  • Authors:
  • Svetlozer Nestorov;Serge Abiteboul;Rajeev Motwani

  • Affiliations:
  • Department of Computer Science, Stanford University, Stanford, CA;Department of Computer Science, Stanford University, Stanford, CA;Department of Computer Science, Stanford University, Stanford, CA

  • Venue:
  • ACM SIGMOD Record
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

When dealing with semistructured data such as that available on the Web, it becomes important to infer the inherent structure, both for the user (e.g., to facilitate querying) and for the system (e.g., to optimize access). In this paper, we consider the problem of identifying some underlying structure in large collections of semistructured data. Since we expect the data to be fairly irregular, this structure consists of an approximate classification of objects into a hierarchical collection of types. We propose a notion of a type hierarchy for such data, and outline a method for deriving the type hierarchy, and rules for assigning types to data elements.