Structural proximity searching for large collections of semi-structured data

  • Authors:
  • Michael Barg;Raymond K. Wong

  • Affiliations:
  • University of New South Wales, Sydney, Australia;University of New South Wales, Sydney, Australia

  • Venue:
  • Proceedings of the tenth international conference on Information and knowledge management
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

The richness of the XML data format allows data to be structured in a way which precisely captures the semantics required by the author. It is the structure of the data, however, which forms the basis of all XML query languages. Without at least some notion of the structure, a user cannot meaningfully query the data. This problem is compounded when one considers that heterogeneous data adhering to different schema are likely to exist in the database(s) being queried. This paper proposes a solution based on an efficient proximity index. In particular, we describe a family of encoding and compression schemes which enable us to build an index to efficiently implement the proximity search. Our index is extremely small, and can reflect updates in the underlying database in modest time. Experiments show that our algorithm and implementation are fast and scale well.