Fast structural join with a location function

  • Authors:
  • Nan Tang;Jeffrey Xu Yu;Kam-Fai Wong;Haifeng Jiang

  • Affiliations:
  • The Chinese University of Hong Kong, Hong Kong, China;The Chinese University of Hong Kong, Hong Kong, China;The Chinese University of Hong Kong, Hong Kong, China;IBM Almaden Research Center, San Jose

  • Venue:
  • DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
  • Year:
  • 2006

Quantified Score

Hi-index 0.02

Visualization

Abstract

A structural join evaluates structural relationship (parent-child or ancestor-descendant) between xml elements. It serves as an important computation unit in xml pattern matching, such as twig joins. There exists many work on efficient structural joins. In particular, indexes can expedite structural joins by skipping unmatchable elements. A typical use of indexes is to retrieve, for a given element, all its ancestor (or descendant) elements from an indexed set. However we observed two possible limitations with such index probes, namely false hit and false locate. A false hit means that an index probe touches unnecessary data besides real results; a false locate stands for a (wasted) probe that has zero answers. Obviously false hit and false locate can affect negatively the efficiency of structural joins. In this paper, we challenge ourselves to develop new structural join algorithm with no false hit and no false locate. We illustrate that R−Tree has the no false hit property (in contrast to B+-Tree) and hence is a good candidate for our algorithm. For no false locate, we propose a new function called Location which tells the probing points that will result in matches. We design and implement the Location function using a space-efficient structure, and present our algorithm using R−Tree together with the Location function. Extensive experiments show the efficiency of our algorithm.