Structural inference for semistructured data

  • Authors:
  • Jason Sankey;Raymond K. Wong

  • Affiliations:
  • University of Sydney, Sydney, Australia;University of New South Wales, Sydney, Australia

  • Venue:
  • Proceedings of the tenth international conference on Information and knowledge management
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Semistructured data presents many challenges, mainly due to its lack of a strict schema. These challenges are further magnified when large amounts of data are gathered from heterogeneous sources. We address this by investigation and development of methods to automatically infer structural information from example data. Using XML as a reference format, we approach the schema generation problem by application of inductive inference theory. In doing so, we review and extend results relating to the search spaces of grammatical inferences. We then adapt a method for evaluating the result of an inference process from computational linguistics. Further, we combine several inference algorithms, including both new techniques introduced by us and those from previous work. Comprehensive experimentation reveals our new hybrid method, based upon recently developed optimisation techniques, to be the most effective.